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^ control  problem  without  noise-intensity  variations,  the  first-order  optimal  control  law  modifications 
here  are  found  to  be  the  inclusion  of  the  state  covariance  matrix  as  a measurement-driven  variable  in 
the  state  estimator,  the  appearance  of  deterministic  skewness  variables  in  this  estimator,  and  the  addi- 
( tton  of  a deterministic  term  to  the  control.  Part  of  the  additive  control  term  can  be  interpreted  as  the 
“dual  contror*effect,  and  it  is  coupled  to  the  control  through  a matrix  time  function  whose  properties 
are  investigated.  A useful  refinement  of  the  certainty -equivalence  principle  is  made.  When  the  measure- 
ment noise  is  state-dependent,  the  differential  equations  for  state  estimation  have  a random  driving 
term  containing  the  scatter  matrix  of  the  measurements,  which  imposes  some  additional  restrictions  on 
the  validity  of  the  analysis.  Some  aspects  of  the  results  are  shown  to  generalize  to  the  case  of  a 
quadratic  exponential  criterion,  although  that  situation  is  more  complicated.  A method  for  including 
the  effects  of  noise  intensity  gradients  in  iterative  optimization  algorithms  is  described,  and  a numerical 
example  is  given. 
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DUAL  PERTURBATION  CONTROL 


INTRODUCTION 

One  problem  of  considerable  interest  in  stochastic  optimal  control  theory  is  that  of 
minimizing  the  expected  value  of  a quadratic  criterion  in  the  presence  of  linear  dynamics 
and  state  measurements,  both  of  which  are  perturbed  by  additive  Gaussian  white  noise 
processes  whose  parameters  are  known  a priori.  This  classical  “linear-quadratic-Gaussian” 
case  is  important  because  it  is  both  analytically  tractable  and  descriptive  of  noise-induced 
perturbations  from  nominal  behavior  in  a more  general  class  of  optimal  control  problems 
[1].  However,  it  has  the  simplifying  but  degenerate  property  that  the  optimal  control 
law  is  the  functional  composition  of  the  solutions  to  a deterministic  optimal  control 
problem  and  a state  estimation  problem  that  are  essentially  independent  of  each  other 
(the  certainty-equivalence  property).  The  only  effect  the  choice  of  control  has  on  the 
state  estimation  results  is  to  shift,  by  a known  amount,  the  mean  of  the  conditional  state 
distribution.  Heuristic  ally,  this  means  that  in  this  case  the  acquisition  and  exploitation 
of  state  information  are  independent. 

Some  analogous  results  have  been  obtained  recently  [2]  for  a variant  of  this  problem 
in  which  the  criterion  is  changed  to  a quadratic  exponential,  as  it  might  be  in  minimizing 
a terminal  miss  distance  and  the  probability  of  a control-dependent  Poisson  failure.  The 
certainty-equivalence  property  does  not  extend  to  this  case,  but  there  is  still  no  conflict 
between  the  acquisition  and  exploitation  of  state  information  because  the  estimation 
results  are  control-independent  here  also.  There  is  such  a conflict  in  the  general  stochastic 
optimal  control  problem,  however,  where  the  “quality”  of  the  state  estimate  can  be 
influenced  by  the  choice  of  control.  The  optimal  control  law  in  such  cases  can  therefore 
be  interpreted  as  having  a dual  character  [3] ; it  represents  an  optimal  compromise  between 
acquiring  and  exploiting  state  information  for  the  ultimate  purpose  of  minimizing  the 
criterion. 

This  dual  character  is  investigated  here  by  considering  an  extension  of  the  linear- 
quadratic-Gaussian  problem,  in  which  the  noise  “covariance”  matrixes  vary  as  functions 
of  the  instantaneous  state  and  control.  An  exact  solution  is  not  attempted,  but  a 
dynamic  programing  approach  provides  an  explicit  expression  for  the  optimal  control 
law— in  terms  of  initial  value  systems  of  ordinary  differential  equations— which  is  accurate 
to  first  order  in  the  covariance  matrix  variations  under  the  restriction  that  they  remain 
small  (and  linear  as  functions  of  the  state  and  control).  Such  results  at  least  show  how 
the  optimal  control  law  starts  to  be  affected  in  this  particular  context  when  the  choice 
of  control  begins  to  influence  the  quality  of  the  state  estimate,  and  this  provides  a starting 
point  for  speculation  about  these  effects  in  a more  general  context.  Hence,  this  might  be 
called  a “linear-quadratic-Gaussian  infinitesimal”  control  problem.  For  such  problems  arising 
from  a perturbation  analysis  of  a more  general  situation,  however,  the  restriction  of 
smallness  here  presents  little  additional  loss  of  generality,  and  the  level  of  accuracy  is 
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compatible  with  that  of  the  original  analysis.  Furthermore,  some  of  the  phenomena 
appear  to  generalize  at  least  to  the  case  of  the  quadratic  exponential  criterion,  although 
the  lack  of  a certainty-equivalence  property  complicates  their  interpretation  in  that  case. 


NOTATION 

Unless  otherwise  noted,  lower  case  letters  are  used  here  to  denote  (real)  column 
vectors  or  scalars,  and  capital  Roman  letters  denote  matrixes.  For  a matrix  A,  AT 
denotes  the  transpose  of  A.  If  A is  square,  |A|  denotes  its  determinant,  tr(A ) denotes  its 
trace 

m 

and  adj(A ) denotes  its  adjoint,  a square  matrix  whose  i,  ;'th  component  is  the  ith  cofac- 
tor of  A . If  a is  a vector,  each  component  of  which  is  a function  of  another  vector  x, 
then  ax  denotes  the  matrix  of  partial  derivatives  such  that  (ax  )jj  = daj/dxj.  If  a is  a scalar, 
then  aA  denotes  the  matrix  of  partial  derivatives  such  that  (aA = 9a/dA;(-. 

It  will  also  be  necessary  here  to  manipulate  three-way  matrixes  of  real  numbers, 
which  will  always  be  denoted  by  capital  Greek  letters.  For  continuity  of  notation,  we 
adopt  the  following  definitions  for  such  a matrix  f = {rijfe  : j = 1,  ...,  /;  j = 1,  ...,  J; 
k = 1,  ....  K }: 

Postmultiplication  by  a column  if -vector  x gives  an  / X J matrix  such  that 

K 

{Vx)ij=  L rijkxk • 
fe=l 

Premultiplication  by  an  N X / matrix  A gives  an  N X J X K three-way  matrix  such 

that 

i 

i=l 

Postmultiplication  by  a K X N matrix  B gives  an  / X J X N three-way  matrix  such 

that 

K 

• L rvA«- 

k‘l 

The  transpose  ofTisaifX/XJ  three-way  matrix  f such  that  (r')feil  = ri7fe.  If 
/ = K,  Tr(T)  is  a column  J-vector  such  that 

[7V(r)]y  = 


i»  1 
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T is  called  symmetric  if  T = T'  = T"  and  T jjk  = rjik  (=*•/  = J = K). 

AxT  denotes  a three-way  matrix  such  that  {AxT)ijk  = Ajjxk. 

With  these  definitions,  the  expressions  ArBx,  BAxT,  AxTB  and  AxTy  are  unambigu- 
ous. Many  other  consequences  are  obvious.  Some  useful  properties  that  are  not  so 
obvious  are  listed  below. 


fr(r'x)  = [7>(r)]Tx  and  [fr(r'x)]  = [7V(D]r 

OX 

ATr(r')  = Tr[(AD']  and  BTTr(r")  = Tr[(rB)"] 

7V(Ar)  - 7Y(rA) 

(rfl)'  = Bt r'  and  r"B  = (Bt D" 

r'xB  =(bt  ryx 
rxx  = Tr(r'xxT) 

r symmetric  •*  (i4Tr>l  )'A  and  B^BVB7)"  symmetric 

Parentheses  are  omitted  in  this  notation  if  the  order  of  multiplicative  association  is 
immaterial  or  if  the  interpretation  is  unambiguous;  for  example,  TxB  must  mean  (Tx)B 
because  xB  is  not  defined.  If  x is  a vector  and  a is  a scalar,  then  aAx  denotes  the  three- 
way  matrix  of  second  partial  derivatives  such  that 


(aAx)ijk  = a2a/3Ayj3xfc. 

The  probability  density  function  of  a random  variable  x is  denoted  by  px(’)  and  the 
corresponding  expectation  operator  by  Ex.  Where  the  meaning  is  clear  from  the  context, 
p(x),  E(x),  and  E(x/y)  are  often  used  as  abbreviations  for  px  (x),  Ex(x)  and  Ex/y(x,  y). 
The  covariance  of  x is  denoted  by  cov(x). 


PROBLEM  FORMULATION  AND  MOTIVATION 

The  problem  of  primary  interest  here  is  the  following  extension  of  the  familiar 
linear-quadratic-Gaussian  optimal  control  problem,  in  which  the  covariance  matrixes  of 
the  process  and  measurement  noises  are  allowed  to  have  a certain  kind  of  dependence  on 
the  state  and  control  vectors: 


x = Fx  +Gu  + w, 

x(t0)  is  Normal  (x0,P0)  a priori 


(dynamics) 


z = Hx  + v (state  measurements) 


(1) 

(2) 
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J~\E 


c'f 

T(tf)SjX(tf)  + I (xT Ax  + utBu  + 2 cTu)  dt 
'o 


where  time  argument  t is  suppressed  in  the  notation  and 
E denotes  prior  expected  value 
x is  an  n-dimensional  state  vector 


(scalar  criterion) 


(3) 


u is  an  m- dimensional  control  vector 


z is  a fe -dimensional  state  measurement 


w and  v are  independent  zero-mean  Gaussian  white  noise  processes  with  respective 
covariance  matrix  parameters 


Q + 2r'u  + 


R + 2 Sl'u 


> given  u and  x 


A,  Q,  and  Sf  are  symmetric  positive-semidefinite  matrixes 
B and  R are  symmetric  positive-definite  matrixes 


All  components  of  T,  and  ft  (which  may  be  time-vaiying)  are  approximately 
infinitesimal— let  us  say  of  order  h;  all  other  quantities  are  of  order  unity,  including 
B~l  and  Ji_1 


T \jk  = r'jik , = ty'jjk , and  ft|;fe  = ftjife  (to  retain  covariance  matrix  symmetry). 

An  alternate  criterion  of  exponential  form  is  also  considered  for  comparison,  but  discus- 
sion of  this  is  deferred  until  later.  Including  state  dependence  in  the  measurement  noise 
covariance  matrix  presents  special  difficulties  and  is  considered  separately. 


The  objective  here  is  to  determine,  at  least  to  first  order  in  h,  the  control  law  that 
minimizes  criterion  J.  As  usual,  a control  law  is  defined  as  a decision  rule  that  determines 
the  control  «(t)  as  a function  of  the  available  measurement  history  Z(t)  = { [s,  z(s)] : 

^[fo*  M } • Since  white  noise  processes  do  not  really  exist  except  as  a kind  of  shorthand 
notation  for  sequences  of  approximating  step-function  processes,  the  control  law  sought 
here  should  really  be  interpreted  as  a limiting  form  of  the  solutions  to  a sequence  of 
restricted  optimal  control  problems  in  which  the  control  and  noise  values  change  only  at 
a finite  number  of  specified  intermediate  times,  where  the  maximum  time  interval  between 
such  changes  goes  to  zero  in  the  control  problem  sequence. 


The  development  here  is  formal,  however,  in  the  sense  that  no  investigation  is  made 
of  the  conditions  under  which  such  a limit  concept  is  meaningful.  The  reason  for  treating 
the  problem  in  continuous  time  here  is  the  more  concise  form  of  the  results,  together 
with  the  fact  that  they  can  serve  as  a single  approximation  to  the  results  for  any  approxi- 
mating discrete-time  problem  with  a short  enough  discretization  interval.  As  usual,  the 
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process  noise  covariance  matrix  for  such  a discretization  interval  of  generic  length  A and 
index  i is  normalized  as 


— [Q(f0  + iA)  + 2r'(f0  + iA)Ui  + 2^'(t0  + iA)x(] 


so  that  the  statistics  of  the  noise  increments  on  that  interval  are  asymptotically  the  same; 
i.e., 


r f0+(i+DA 

cov[u>,-A]  -*■  cov  < I w(t)  dt 

|V‘A 


as  A -*■  0 


and  similarly  for  the  measurement  noise  covariance  matrix. 


Aside  from  their  conceptual  interest,  problems  of  this  class  can  arise  in  the  following 
way.  Assume  that  a solution  to  a nominal  deterministic  optimal  control  problem  is 
available  and  that  perturbations  about  this  nominal  path  are  observable  and  controllable. 
Suppose  also  that  there  are  process  and  measurement  noises,  ignored  in  the  nominal 
solution,  whose  covariance  matrixes  possibly  depend  on  the  state  and  control.  A common 
approach  to  minimizing  the  actual  expected  value  of  the  criterion  in  such  a case  is  to 
seek  a feedback  solution  to  an  “accessory  minimum  problem”  for  the  perturbations, 
under  the  assumption  that  they  remain  approximately  infinitesimal.  In  this  context,  the 
accessory  minimum  problem  would  be  constructed  by  linearizing  the  dynamics  and  the 
noise  covariance  dependences  about  the  nominal  path.  If  the  resulting  problem  is  rescaled 
so  that  the  state  and  control  perturbations  are  of  order  unity,  it  is  often  reducible  to  a 
stochastic  optimal  control  problem  of  the  above  form,  where  the  covariance  dependence 
coefficients  become  the  small  quantities.  Of  course,  the  case  of  state-dependent  measure- 
ment noise  cannot  be  accommodated  under  the  present  restriction.  Moreover,  since  linear 
terms  in  the  controls  are  included  in  the  criterion  of  this  formulation,  a problem  of  this 
class  could  also  represent  an  iteration  in  a corresponding  second-order  gradient  algorithm, 
in  which  the  linearizations  and  second-order  expansion  of  the  criterion  are  constructed 
about  a trajectory  that  is  not  optimal  in  the  deterministic  problem.  The  importance  of 
this  lies  in  the  possibility  of  iteratively  modifying  the  nominal  path  to  account  optimally 
for  noise-intensity  gradients. 


STATE  ESTIMATION 


For  state-independent  process  noise  ('J'  = 0),  both  noise  covariance  matrixes  can  be 
regarded  as  known,  since  the  current  control  values  are  assumed  known.  In  this  case, 
therefore,  it  follows  from  well-known  results  for  the  Kalman-Bucy  filter  [4]  that  the 
conditional  probability  density  of  the  state  _given  the  available  measurements  is  Gaussian, 
and  that  its  mean  x and  covariance  matrix  P obey  the  equations 


x = Fx  + Gu+FHtR'1(z-Hx);  *(f0)  = i0 


(4) 


and 


P + FP  + PFt  + Q-PHtR-1HP;  P(t0)  = P0 
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where 


Q = Q + 2r'u 

(6) 

R = R + 2 Sl'u. 

(7) 

For  nonzero  'I',  it  is  shown  below  that  this  conditional  density  no  longer  remains  Gaussian 
to  first  order,  but  rather  is  of  the  form 


(l  + Xr(x  -x)  + tr 


- L[(x  -x)(x  -x)T  - V] 


+ (*  - *)(*  - x)tA(x  - x) 


exp  --i-  (x  -x)TV  X(x-x) 


(27T)1/2n|V|1/2 


(8) 


where  V,  L,  and  A are  symmetric,  V is  positive  definite,  and  the  components  of  X,  L,  and 
A are  all  of  order  h.  In  general,  Eq.  (8)  can  assume  negative  values  for  large  enough 
magnitudes  of  (x  - jc)  and  must  be  modified  slightly  to  be  a proper  probability  density. 
Because  of  the  rapid  decay  of  the  exponential  factor,  however,  these  modifications  can  be 
confined  to  a region  whose  probability  mass  is  negligible  to  arbitrary  order  in  h for 
sufficiently  small  h,  so  Eq.  (8)  will  be  treated  as  a proper  density  in  the  following.  Since 
it  has  the  form  of  a Gaussian  density  function  multiplied  by  a polynomial,  it  follows 
directly  from  standard  results  for  Gaussian  moments  that  the  integral  of  Eq.  (8)  over  Rn 
is  unity  and  that 


F(x)  = x + VX  + VTr(AV)  = ju  (9) 

cov(x)  = F[(x  - ju)(x  - At)r]  = V+  VLV-(h-'x)(ii-x)t  =U.  (10) 

Assume  now  that  the  conditional  density  of  the  state  x at  time  t is  of  the  form  of 
Eq.  (8).  After  a short  time  increment  A has  elapsed,  the  conditional  density  of  the  state 
at  time  (t  + A)  can  be  determined  to  first  order  in  A by  first  finding  the  density  of  y, 
where 


y = (/  + .FA)*  + GuA  + co, 

and  to  is  a random  variable  whose  distribution  given  x is  zero-mean  Gaussian  with 
covariance  matrix  (Q  + 2'l/'x)A,  and  then  finding  the  conditional  density  of  y given  z, 
where 


z = Hy  + t 


and  £ is  an  independent  zero-mean  Gaussian  random  variable  with  covariance  R/ A.  Since 
w itself  is  regarded  as  a step-function  approximation  to  white  noise  in  Eq.  (1),  no  correc- 
tion term  of  the  sort  described  by  Wong  and  Zakai  [5]  is  needed  here  to  compensate  for 
the  state  dependence  of  the  process  noise. 
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If  * has  the  density  function  of  Eq.  (8)  and  s = Kx  + b,  where  K and  b are  constants 
and  a exists,  it  is  straightforward  but  tedious  to  show  from  standard  results  for  the 
transformation  of  probability  densities  that  s also  has  a density  of  the  same  form,  namely 


P(s)  = jl  + \tK  1 (s  - s)  + tr  ji-  K-^LK'1  [(s  -7)(s  -!)' 


+ \ (•-l)(«-8)T(jrlTAJr1)'jr1(«-7) 


s = Kx  + b. 


KVKt] 

- ll2(s-i)T(KVKTrl(s-i ) 


{2n)ll2n  \KVKt\112 


(11) 


If  s is  used  now  to  denote  (I  + FA)x  in  particular,  it  follows  that  s has  a density  of  the 

(®«AS»Ce  (/  + FA)  always  exists  for  sufficien«y  small  A.  Furthermore,  its 
K L,  and  A parameters  differ  from  those  of  the  density  of  x only  by  order  hA. 
From  general  results  for  means  and  covariances, 

E(s)=(I  + FA)p=p2 

cov(s)  = (/  + FA)U(I  + Ft  A)  = U2. 

Since  knowing  s is  equivalent  to  knowing  x,  the  probability  density  of  r = s + oj  is  riven 
by  the  equation  6 


P(r)  = p(s  + w)  = I ps(r  - co)p^/x  [co,(/  + FA)-1(r  - w)]do;. 

JRn 


(12) 


It  is  assumed  initially  that  Q'1  exists,  in  which  case  the  density  p,  ,/vfoj,  (/  + FAl-1xl 
can  be  approximated  to  order  h as  ' J 


r 

- / i V 

1 

exp--1-  cotQ-1gj 

Pcj/x  Iw.  (I  + FA)~*x]  = - 

1 - tr 

Q-'Vx  I-  4 Q-'ojcjt) 

\ 

2A 

! 

l 

L \ A J. 

J 

_ (27r)1/2n|QA|1/2 

'F=  ^(f  + FA)'1. 

Using  Eq.  (8)  to  represent  ps(x)  and  substituting  it  and  Eq.  (13)  in  Eq.  (12)  gives 


(13) 


f 

I 


P{r) 


■l 


fe(cu)  exp 


\ t(r-  oj  -x)TV‘1(r  - cu  -x)  + ojtQ-1oj]|  du> 


(2ff)r,|VQA|1'2 
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to  order  hA,  where 


ft(co)  = 1 + XT(r  — co— x)  + tr*— ■ L[(r  - co  -x)(r  - co  - x)T  - V] 

a 


+ — (r  - co  - x)(r  - co  - x)T A (r  - u>  - x) 
o 


-Q-ly\>'(r-  co)^/-  i Q-icocoTjJ. 


Completing  the  square  in  the  exponent  gives 


P(r)  = 


exp--  (r-x)r(F  + QA)  1(r-x) 

^ j 

(27r)1/2"|V  + QA|1/2 


expj-i  [co  - QV“x(r  -x)A]T[QA  - QF^A2]'1  [co  - QV~' (r  -s)A][ 

X 3 i dco. 

(27r)1^2n|QA  -QV“1QA2|1^2 

The  integrand  in  this  expression  is  a Gaussian  density  multiplied  by  a polynomial,  so  it  is 
straightforward  to  verify  that,  to  first  order  in  A, 

p(r)=  ^1 +X^(r-x)  + tr|-|  L[(r-x)(r-x)r  - (V  + QA)]  + (r  -~x)(r -~*)rA(r  - x) -j 
exp  - (r -~x)T (V  + <?A)_1(r  -~x) 


(27r)1/2n|V  + QA|1/2 


where 


X = X + A[7>(QA)- V"1(QX  + 27VP'',]  (15) 

Z = L + A{[(^14>V~1)x"-£Qtr'1]  + [( V-1^  V1  )'jF-  LQV-1] r}  (16) 

A = A + AKF-^V-1  - AQV'1)  + ( ^ V~x  - AQV1)'  + ( V-1  ^ V-1  - AQV'1)"].  (17) 

This  is  again  a density  of  the  form  of  Eq.  (8).  From  the  definition  of  s and  Eqs.  (11) 
and  (15)  through  (17),  the  components  of  X,  L,  and  A for  x and  r differ  only  by  order 
hA.  Applying  Eqs.  (9)  and  (10)  to  Eqs.  (14)  through  (17)  shows  that,  since  y = r + GuA, 

E(y)  = n2  + GuA  = n 3 (18) 

cov(y)=  U2  +(Q  + 24''p)A  = M (19) 
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which  can  also  be  verified  directly  by  decomposing  the  expectations.  The  same  result 
holds  for  singular  Q by  continuity,  since  it  does  not  involve  Q_1.  Equation  (11)  implies 
that  y has  the  same  density  parameters  as  r,  except  that  F increases  by  GuA. 


As  a function  of  y,  conditional  density  p(y/z)  is  proportional  to  p(z/y)p(y).  Com- 
pleting the  square  in  the  exponent  of  this  product  shows  that 


p(y/z ) = g { 1 + XT(y  - X)  + tr 


\ £[(y-*)(y-*)T-(v  + QA)] 


+ g (y-*)(y-*)TA(y -*)}■)  exp 


- \ (y  -y)TV~l(y -y) 


where  g is  a constant  of  proportionality,  x now  denotes  the  parameter  in  Eq.  (8)  for 
p(y),  and 


V = V + QA  - VHTR~XHVA 
7=  x + VHtR~ 1 (z  - H7)A. 
The  polynomial  factor  in  p(y/z)  can  be  expressed  as 


a + bT(y  -y)  + tr 


L*[(y  -y)(y  -y)T  - V]  + (y  -y)(y  -y)TA*(y  -y)| 


where 


a = 1 + A 


P" VHtR~x  (z  - Hx)  + tr  (vA VHTR~x(z  - Hie) 


+ | L{VHTR-I[(z-Hx)(z-HZ)TA-R]R-XHV^J 


b = X + A[L  + A VHtR~x  (z  - Hx) A]  VHT R -1  ( z - Hx) 
L*=  L + 2A VHtR~1  (z  - Hx)A 


A*  = A 

to  first  order  in  A.  The  quantity  ( z - Hx) A is  regarded  as  a term  of  order  A112  here. 
Since  p(y/z)  is  a probability  density  and  must  integrate  to  unity,  a can  be  absorbed  into 
the  proportionality  constant  g to  express  this  density  in  Jhe  form  of  Eq.  (8)  such  that  the 
X,  L,  and  A components  differ  from  those  of  X,  L,  and  A only  by  terms  of  order  hA  and 
zero-mean  random  terms  of  order  hA112.  Carrying  out  the  details  to  order  A (only  to 
order  A1/2  for  zero-mean  random  terms)  and  using  Eqs.  (9)  and  (10)  show  that 


— 
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E(y/z ) = #13  + MHtR~'  (z  - Hh3  )A  + MTr{MHTR~l  [(z  - )(z  - Hn3  )TA 

- + terms  of  order  h2A  (20) 

cov(y/z)  = Af  - MHtR~  1 HM  + MHTR'1  (z  - Hn3  )A  + terms  of  order  h2A 

+ zero-mean  random  terms  of  order  h2  A1/2  (21) 

A = A + terms  of  order  ft2  A + zero-mean  random  terms  of  order  h2  A112.  (22) 

Efforts  to  obtain  similar  results  in  this  way  with  state-dependent  measurement  noise  have 
been  unsuccessful  at  this  point,  possibly  because  more  information  about  the  state  is  given 
in  this  case  by  the  scatter  of  a series  of  measurements  over  a short  period  of  time  than 
by  their  average  value.  Since  the  zero-mean  random  terms  are  statistically  independent 
for  disjoint  time  increments  and  since  the  third  term  in  Eq.  (20)  is  a zero-mean  random 
term  of  order  A,  the  last  two  types  of  terms  in  each  of  Eqs.  (20)  through  (22)  can  be 
neglected  because  they  only  contribute  effects  of  order  h2  or  smaller  when  “integrated” 
over  a time  interval  of  order  unity. 

If  we  return  to  the  notation  of  Eq.  (4)  and  (5),  the  overall  result  is  that 
x(t  + A)  = x(t)  + A{F(t)*(f)  + G(t)u(t)  + PU^m^mzit)  - //(f)i(f)]  }, 

P(t  + A)  = P(f)  + A{F(f)P(t)  + P(f)PT(f)  + Q(f)  + 2^'(f)i(f) 

H(t)P(t)  + 2[P(t)A(f)P(f)]'P(0//T(f)P-1(t)[2(f) 

-H(t)x(t)]}, 


A(t  + A)  = A(f)  + A (r1  ( tmt (t)  - A(t)(F(t)  + Qitir1  (0) 

+ {P^dWOP^t)  - A(t)[F(t)  + Q(t)P^1(t)]>' 

+ [/^(fWOP^f)  - A(t)[P(f)  + Q(f)F^1(0]]  ") 

to  first  order  in  A,  except  for  terms  contributing  effects  of  order  A but  of  second  order 
in  h.  Furthermore,  the  X,  L,  and  A components  describing  the  conditional  density  of  x 
in  the  notation  of  Eq.  (8)  change  only  by  amounts  of  order  hA  in  this  interval,  so  they 
remain  of  order  h\  also,  L and  A remain  symmetric.  It  is  convenient  at  this  point  to 
express  the  conditional  covariance  matrix  as  the  sum 

P = P + 2D 

where  P is  a “nominal  covariance  matrix”  defined  as  a deterministic  time  function  by  the 
classical  Kalman-Bucy  filter  equation: 
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P = FP  + PFt + Q-PHtR~1HP;  P(t0)  = P0.  (23) 

Since  fl-1  = (R  + 2fi'u)-1  = fi'1  (R  - 212'u)R_1  to  first  order  in  h and  since  the  com- 
ponents of  D are  of  order  h,  the  mean  x and  covariance  matrix  P + 2D  of  the  conditional 
state  distribution  are  determined  to  first  order  in  h in  the  limit  as  A -*•  0,  by  Eq.  (23) 
and  the  equations 

i = Fx  + Gu  + [PHT(I-2R-1n'u)  + 2DHT]R-1(z-Hx);  *(f0)  = *0  (24) 


D = (F-PHtR-1H)D+D(Ft  -HtR-1  HP)  + *'i  + (r  +PHTR-ln,R-1HP)'u 
+ (. PAP)'PHtR~ 1 (z  - Hx)\  D(t0 ) = 0 (25) 

A = © + ©'  + ©",  ©^  -\(F  + Qrly,  A(t0)  = 0 (26) 


where  the  “f”  argument  is  suppressed  in  the  notation.  Three-way  matrix  A is  a determinis- 
tic time  function  related  to  the  skewness  of  the  conditional  state  distributioh,  and  is 
identically  zero  in  the  case  of  state-independent  process  noise,  when  'l'(f)  is  identically 
zero.  To  first  order  in  h,  therefore,  the  so-called  information  state  for  this  estimation 
process  consists  of  both  x and  D,  not  just  x,  and  hence  differs  significantly  from  the  sys- 
tem state  x. 


OPTIMIZATION 

It  follows  from  the  arguments  of  Stratonovich  [6]  and  Striebel  [7]  that  an  “optimal 
cost  function”  can  be  defined  consistently  here  in  terms  of  t and  the  current  conditional 
distribution  of  the  state  given  the  preceding  measurements.  It  is  assumed  that  conditions 
are  such  that  the  solution  to  the  corresponding  Bellman  equation  and  boundary  condition 
is  unique  and  that  it  is  sufficiently  regular  that  second-order  changes  in  the  equation 
produce  only  second-order  changes  in  the  solution.  It  is  convenient  to  proceed  by  con- 
sidering the  possibility  of  such  cost  functions  depending  only  on  x,  D,  and  t to  first  order 
in  h,  in  which  case  there  exists  a (scalar)  function  J(x,  D , f)  such  that 

J(x,D,  t)  = conditional  expected  “cost-to-go,” 


that  is, 


e|-|-  jxT(tf)SfX(fy)  + ( xtAx  + utBu  + 2 cTu)  dt  , 


using  an  optimal  control  law,  given  that  x (t)  - x and  P(t)  = P(t)  + 2D,  plus  terms  of 
second  order  in  h or  smaller.  (The  question  of  the  possible  nonexistence  of  an  optimal 
control  law  is  not  examined  here.)  The  usual  invariant  imbedding  formalism  of  dynamic 
programming  (see  Dreyfus  [8] , for  example),  using  x and  D as  state  variables  and  neglect- 
ing second-order  terms,  shows  that  the  Bellman  equation  reduces  to  the  following  equation 
for  J in  this  case: 
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0 = min  E (xrAi  + uTBu)  + cTu  + J^x  + Jt  + tr(jDD  + i J-  - — 


(27) 


where  the  expectation  is  conditioned  on  the  event  x(t)  = x and  P(t)  = P(t)  + 2D,  or, 
equivalently,  on  Z(t).  Evaluating  the  conditional  expected  cost-to-go  at  the  terminal  time 
shows  that  J must  also  satisfy  the  boundary  condition 


J(x, D,  tf)  = | xTSfx  + fr  + S,Z)J. 


(28) 


If  a function  J(x,  D,  f)  that  satisfies  (27)  and  (28)  to  first  order  in  h can  be  found,  then 
it  is  a first-order  approximation  to  the  optimal  cost  function  and  determines  the  optimal 
control  law  to  first  order  by  the  regularity  assumption. 


in  h. 


Taking  expected  values  in  Eqs.  (27)  and  using  Eqs.  (24)  and  (25)  gives,  to  first  order 


0 = min^|  {xTAx  +fr[4(f>  + 2D)]  +uTBu } + cTu  + J^Fx  + Gu)  + Jt  + tr^ID(FD  + DF? 

+ r'u  + * X - DHTR-^HP  - PtfrR-'HD  + PHTR-XQ.'UR-\HP)  + Jn  Q-  PHTR~lHP 
+ DHTR-lHP  + PHTR-'HD  - PHTR-lSl'uR-lHPj^ 

+ E (P™HTR~1M)iik(PJD£)ji\  (29) 

ijk  ' 

Collecting  terms  and  cyclically  permuting  matrix  products  in  the  trace  operand  gives 
0 = min  (i  &Ta*  + uTBu J + °Tu  + *[«  + G“1  + Jt  + tr  AP  + ^ JnPHTR~'HP 

u 12  2 

+ [A  + JdF  + FtJd  - (JD  - J^yPW'R-XH  - rfTR-lHP{JD  - Jn))D 
+ [R-'HP(Jd  - JiSi)PHTR-Xn'  + Jbr'luj  + [7V(Jb^))ri 

+ E (PAPHTR-lHP)ijk(PJDx)jii)-  (30) 


ijk 


; 
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Equating  the  u-derivative  of  the  left-hand  side  of  Eq.  (30)  to  zero  specifies  the  minimiz- 
ing control  as 

u =-B-'{GtjT  +c  + Tr[R-lHP(JD  - Jii)PHTR-^^l  ^ JD H).  (31) 

Substituting  Eq.  (31)  into  Eq.  (30)  to  eliminate  the  minimization  operation  gives 

xTAx  + JtFx  + [!>(«//)'»')]  Tx  + Jt  + tr  ||  AP  + ~ J^l^R-^HP 

+ [A  +JdF  + FTJd-(Jd  -Jn)PHTR-lH-HTR-lHP{JD  -./**)]/>} 

- i j GtjT  +c  + Tr(jDr  + R-'HP  ^ JD 

- 1 tb-'\gtjT  + c + Tr{jDr  + r-'hp(jd 

- \ Jxx)PHT*'1n)  1 + E (i*A PtrrR-'HP^jkWoxljik  = 0.  (32) 

2 ' /J 

For  the  function  J defined  as 


J(x,  D,t)=\  xTSSc  + 4>tx  + tr(ND)  + \ e 
z z 


'33) 


where  S,  <t>,  N,  and  e are  (deterministic)  functions  of  f only,  the  partial  derivatives  are 


Ji  = x TS  + <PT 
Jn  =N 


Jxx  =s 


JDx  ~ 0 


Jt  =~  xTSx  + + tr(ND)  + e 


(34) 


It  can  be  verified  by  substituting  Eq.  (33)  in  Eq.  (28)  and  Eq.  (34)  in  Eq.  (32)  that  Eqs. 
(28)  and  (32)  are  satisfied  by  Eq.  (33)  if 

S = - SF  - FtS  - A + SGB~1GtS;  S(tf ) = Sf  (35) 

N = - NF  - FtN  -A  + HP(N  - S)  + (N  - S)PHTR-'H;  N(tf)  = Sf  (36) 
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0 = (SGB~1Gt  -FTyt>+SGB-l{c  + Tr[Nr  + R~x HP(N  - S)PHT R'1  SI])  - Tr(N'l'); 


Htf)  = 0 

« = \gt4>  + c + 7>[ATr  + R~1HP{N  - S]PHTR~ 1 [gt0  + c 

+ iv [Nr  + r~1hp{n  - S)p//Tirln]j  . wap  + sphtr~1hp y, 

e(tf)  = tr[S,P(tf)] . 


(37) 


(38) 


Therefore,  it  follows  from  Eqs.  (31)  and  (34)  that  the  optimal  control  law  can  be 
expressed  to  first  order  in  h in  terms  of  the  solution  of  the  terminal  value  system  of 
ordinary  differential  equations  (Eqs.  (35)  - (37))  as 


u = -B~1{GtSx  + c + GT<t>  + Tr[R-iHP{N -S)PHTR-l SI  +NT]}  (39) 

where  P and  x are  given  by  the  initial  value  system  of  ordinary  differential  equations 
(23)  through  (26).  The  implementation  of  this  control  law  requires  that  Eqs  (24)  and 
(25)  be  integrated  in  real  time-n  total  of  (n/2)(n  + 3)  independent  components  for  n 
state  variables— to  provide  the  current  values  of  x;  the  other  differential  equations  can  be 
solved  beforehand  by  integrating  Eqs.  (23)  and  (26)  forward,  then  Eqs.  (35)  through  (37) 
backward. 


When  T,  fi,  and  ^ are  identically  zero,  Eqs.  (23)  through  (26),  (35),  (37),  and  (39) 
reduce  to  the  well-known  solution 


u = - B“i  [Gt(Sjc  +0)  + c] 

'x  = Fx  + Gu+PHTr-1(2-Hx);  iffo)**,  (40) 

0 = (SGB~1Gt  - FTyir+  SGB~1c;  0(^  = 0 (41) 

of  the  corresponding  classical  linear-quadratic-Gaussian  problem.  Two  first-order 
departures  from  this  classical  solution  are  induced  in  the  optimal  control  law  by  first- 
order  nonzero  values  of  these  quantities.  One  is  the  augmentation  of  the  state-estimation 
equations  by  using  Eqs.  (24)  through  (26)  instead  of  Eq.  (40)  for  determining  x.  The 
other  is  the  addition  of  the  deterministic  time  function 

5u  = - B_1{G7’(0  - 0)  + Tr[R~lHP(N  - S)PHTr-^SI  + NT]}  (42) 

to  the  control.  This  structure  is  displayed  schematically  in  Fig.  1.  Since  the  conditional 
expected  value  of  the  driving  term  (z  - Hx)  in  Eq.  (24)  is  always  zero,  so  is  its  prior 
expected  value,  and  it  follows  from  Eqs.  (1)  and  (24)  that  the  prior  expected  values  of 
x(t)  and  x(t)  are  always  the  same.  Therefore,  the  mean  sample  trajectory  of  the  optimally 
controlled  system  can  be  determined  by  Eqs.  (23),  (35)  through  (37),  (41),  (42)  and  the 
equations 
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Fig.  1 — Optimal  control  law  structure 


x = Fx  + Gu-,  x(tQ)  = x0  (43) 

D = - B~l  [Gt{Sx  +4>)  + c)  +8u  (44) 

where  3c(t)  and  u(t)  denote  the  (prior)  expected  values  of  the  state  and  control  at  time  t 
under  the  optimal  control  law.  If  the  control  problem  here  represents  a second-order 
description  of  the  effects  of  perturbations  about  a nominal  path  in  an  iteration  of  a 
gradient  algorithm,  this  mean  sample  trajectory  is  a natural  candidate  for  the  nominal 
path  in  the  next  iteration.  Such  a gradient  algorithm  would  converge  in  general  to  a 
nominal  path  that  is  different  from  the  deterministic  optimal  (with  c + GT<j>  = Bbu 
instead  of  zero),  the  result  representing  a compromise  between  the  deterministic  optimal 
and  a path  encountering  the  lowest  expected  noise  intensities.  Thus  the  noise  statistics 
in  this  context  enter  into  the  optimization  of  the  nominal  path  as  well  as  the  correction 
of  noise-induced  deviations  from  this  nominal. 

It  is  interesting  to  note  from  Eqs.  (33),  (37),  and  (38)  that  the  introduction  of 
first-order  values  of  T,  £2,  and  ^ only  changes  the  optimal  cost  function  from  the  classical 
value  by  second  order  when  c,  i,  and  D are  all  zero,  a condition  of  particular  interest 
when  the  problem  arises  from  a perturbation  analysis.  However,  this  is  easily  shown  to 
be  the  case  for  the  cost  function  associated  with  any  control  law  which  differs  only  by 
first  order  from  the  classical  optimum.  Since  the  only  first-order  approximations  used 
in  the  derivation  here  were  the  dynamics  for  x and  D in  the  Bellman  equation,  and  since 
first-order  accuracy  in  the  dynamics  is  sufficient  for  second-order  accuracy  in  the  cost  in 
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deterministic  control  problems  of  this  sort  [1],  this  property  suggests  that  the  cost  func- 
tion of  Eqs.  (33)  and  (35)  through  (38)  might  actually  be  accurate  to  second  order  in  h 
under  these  conditions.  The  approximations  here  are  of  a somewhat  different  character, 
however,  because  they  pertain  to  the  dynamics  of  an  information  state  ( x , D)  that  is 
significantly  different  from  the  system  state  x,  and  even  the  simple  scalar  problem  with 
c = n = 'l/  = 0 provides  a counterexample  to  this  conjecture.  It  is  instructive  to  examine 
this  example  in  detail  to  reveal  how  such  a phenomenon  can  occur  and  also  to  enhance 
the  plausibility  of  the  regularity  assumption  by  verifying  that  no  contradiction  arises  here 
when  the  analysis  is  extended  to  second  order.  In  this  case,  the  conditional  distribution 
of  the  state  is  exactly  Gaussian,  with  parameters  that  have  the  following  dynamics  to 
arbitrary  order  in  h: 


x = fx +gu  + ~^  (z -x)\  i(f0)  = x0 

d=2(f-Zy+yu-  ^ ; d(f0)=0. 

The  structure  of  these  equations  is  such  that  there  can  be  no  Wong-Zakai  correction  terms 
for  any  reasonable  control  law  generating  u.  Since  this  conditional  distribution  is  always 
Gaussian,  the  optimal  cost  function  J can  be  expressed  exactly  as  a function  of  only  x,  d, 
and  t.  Using  their  dynamics  in  the  Bellman  equation  and  minimizing  gives  the  following 
equations,  accurate  to  arbitrary  order  in  h\ 


j [ gJi+JW 


\ o(x2  + p + 2d)  + J%fx  + Jd  (gJi  + yJd)2=0 

J(x,  d,  tf)  = ~ Sf(x2  + p(tf)  + 2d). 


An  exact  solution  to  these  equations  is  not  available  in  closed  form,  but  J can  be  imagined 
as  a power  series  in  x and  d with  time-dependent  coefficients,  and  such  a function  of  the 
form 


J=  7-  sx2  + <px  + —-  fd  + pd2  + \xd  + e + third-order  terms 

A £•  A 


can  be  examined  as  a possible  solution.  Substituting  this  expression  and  its  derivatives 
into  the  partial  differential  equation  and  collecting  coefficients  of  like  powers  of  x and  d 
show  that  all  but  third-order  terms  vanish  on  the  left-hand  side  (remember  that  d itself  is 
of  order  h)  and  that  the  boundary  condition  is  satisfied  exactly  if  v = s + y and  if 
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M 


■* 


t 


' :*  • 
r 


= -2 sf- 

+ *2g2  4 
■a+~  + 

sgy\  ' 
b ’ 

s(^)  = sf 

-*(? 

\ 1 2 
-f)y-j(sgy\)-  — 

; y(tf)  = 0 

; 0<f/) 

= 0 

II 

-f)>*  ?■ 

d(tf)  = C 

) 

-K 

-3f)x+| 

(gX  +7P); 

II 

0 

= -ap  - 

~r  w + t1')2; 

e(tf)  = 8fP(tf), 

To  second  order  in  h,  therefore,  these  equations  determine  the  optimal  cost  function  and 
thereby  give  the  optimal  control  law  as 


[g(s£  + <t>)  + yv  + — y\x  + g\d  + 2yvd] . 


Because  - s2g2/b  is  the  only  zeroth-order  driving  term  for  y,  y is  of  order  unity,  and 
since  yjr  is  the  driving  term  for  jl,  so  is  p.  But  7/1  appears  as  the  driving  term  for  X,  so  X 
is  of  order  h,  which  implies  that  s here,  and  therefore  e as  well,  differ  from  those  given 
by  Eq.  (35)  and  (38)  by  order  h 2 . Thus  the  systematic  inclusion  of  second-order  effects 
introduces  second-order  changes  in  both  the  optimal  cost  function  and  the  optimal  con- 
trol law,  even  when  c,  x,  and  d are  all  zero.  Nevertheless,  the  optimal  control  law 
remains  unchanged  to  first  order.  This  phenomenon  appears  to  depend  on  the 
coefficient  X of  the  id  term  in  the  optimal  cost  function,  and  hence  on  a property  of  the 
information  state,  correlation  between  the  dynamics  of  i and  d,  which  has  no  counter- 
part of  the  system  state  x. 


PERFECT  STATE  MEASUREMENTS 


In  the  limiting  case  in  which  current  state  x is  known  exactly  and  can  be  used  in  the 
control  law,  an  optimal  cost  function  J(x,  t)  can  be  defined  as  the  conditional  expected 
cost-to-go  under  an  optimal  control  law  given  that  x(t)  = x.  In  this  case  the  Bellman 
equation  corresponding  to  Eqs.  (1)  and  (3)  can  be  derived  in  the  usual  way  to  give 


min  j./.JFx  + Gu)  + Jt  + tr  jdxx  ( 2 ® + ^ u + + \ (xTAx  + uTBu)  + cTuj  = 0 
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to  first  order  in  h.  Differentiating  to  determine  the  minimizing  control  gives 

u = -B-1[GtJ* +c + Tr(Jxxr)].  (46) 

If  Eq.  (46)  is  substituted  into  Eq.  (45)  to  eliminate  the  minimization  operation,  and  if  the 
function 


J(x,  t)  = xTS{t)x  + rjT(t)x  + ^ 6(t) 


and  its  partial  derivatives  are  substituted  into  the  resulting  equation,  the  left-hand  side  is 
a quadratic  polynomial  in  x whose  coefficients  are  all  identically  zero  if  S satisfies  Eq. 
(35)  and  if 

t?  = (SGB~l  Gt  - FT)r\  + SGB-1  [c  + Tr(Sr)]  - Tr(S'l');  rj(tf)  = 0 (48 


6 = [Gt  17  + c + 7V(Sr)]  tB-x  [Gtt?  + c + 7V(ST)]  - tr(SQ);  8(tf ) = 0.  (49) 

Furthermore,  the  cost  function  of  Eq.  (47)  satisfies  the  boundary  condition 


J(x,  tf)  = — xTSfX 


for  the  terminal  values  given  in  Eqs.  (35),  (48),  and  (49).  Substituting  into  Eq.  (46)  and 
assuming  uniqueness  and  sufficient  regularity  of  solutions  to  the  Bellman  equation  gives 
the  optimal  control  law  here  to  first  order  in  h as 

u = - BTX  [GT(Sx  + r?)  + c + 7V(Sr)] . (51) 

This  control  law  differs  from  the  classical  optimum  only  by  the  addition  of  the  determinis- 
tic time  function  -BTX  [Gt(tj  - 4>)  + Tr(Sr)] . The  mean  sample  trajectory  of  the  optimally 
controlled  system  is  given  by  Eqs.  (35),  (43),  (44),  and  (48),  except  that  this  time  function 
replaces  65  in  Eq.  (44).  Again,  the  covariance  matrix  perturbations  cause  only  a second- 
order  change  in  the  optimal  expected  cost  if  x(t0)  = 0 and  c = 0. 


THE  ROLE  OF  MEASUREMENT  NOISE 

The  effects  of  measurement  noise  in  this  context  can  be  clarified  by  considering  the 
variable 


Y(t)  = N(t)-S(t). 
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TT\e  optimal  control  law  of  Eq.  (39)  can  then  be  expressed  in  terms  of  the  equations 

u = - B-1  [Gt(Sx  + 77)  + c + 7V(Sr)  + GTd  + Tr(YT  + B-1  HPYPHT B-1  fi)]  (53) 

Y = Y(PHTR~lH  -F)  + (HtR~1HP  - Ft)Y  - SGB'1  GTS;  Y(tf)  = 0 (54) 

0 = (SGB-1  Gt  - Ft)0  + SGB-'TrlYT  + B-1  HPYPHT B-1  £2)  - Tr(Y' V)\  6(tf)  = 0 

(55) 

where  P,  x,  S,  and  17  are  as  given  earlier  by  Eqs.  (23)  through  (26),  (35),  and  (48).  Com- 
paring this  realization  with  Eq.  (51)  shows  that  it  is  the  same  control  law  as  the  optimum 
for  the  case  of  perfect  measurements  except  for  the  replacement  of  x by  x and  the  addi- 
tion of  the  deterministic  quantity  -B~l[GT0  + 7V(iT  + B-1  HPYPHT B-1 12 )] , all  of 
whose  terms  are  coupled  to  the  control  through  the  matrix  Y (indirectly  in  the  case  of 

0,  where  Y appears  in  the  driving  term  of  differential  Eq.  (55)  defining  0).  This  struc- 
ture suggests  that  the  concept  of  certainty-equivalence  here  should  refer  to  the  replace- 
ment of  x by  x in  the  optimal  control  law  for  the  case  of  perfect  measurements  with  the 
same  process  noise,  not  the  completely  deterministic  case  (the  two  concepts  coincide  in 
the  classical  perturbation-free  problem).  With  this  interpretation  of  certainty-equivalence, 
the  other  additive  terms  in  the  control  law  can  be  regarded  as  the  “dual  control” 
phenomenon  identified  by  Feldbaum  [3].  This  phenomenon  is  the  deviation  of  the 
optimal  control  from  that  which  exploits  the  current  state  information  optimally  (inter- 
preted here  as  certainty-equivalent  control)  for  the  purpose  of  improving  the  quality  of 
this  information  for  future  exploitation.  Although  the  influence  of  the  noise  covariance 
matrix  perturbations  in  this  dual  control  phenomenon  is  mediated  by  the  matrix  Y,  the 
values  of  Y itself  are  determined  entirely  by  the  corresponding  classical  problem  without 
such  perturbations.  Hence,  Y might  be  regarded  here  as  a coefficient  matrix  governing 
the  sensitivity  of  this  classical  problem  to  dual  effects  caused  by  noise  covariance  per- 
turbations of  this  sort.  This  matrix  plays  no  role  in  the  classical  problem,  however, 
because  the  conditional  covariance  of  the  state  cannot  be  affected  by  the  control  there; 

1. e.,  there  is  no  interference  between  the  acquisition  and  exploitation  of  state  information. 
Also,  it  follows  from  Eq.  (34)  and  the  definitions  of  V and  D that 

Y ~ JD  - Jjix  = 2Jp~~  Jjix  (56) 

in  the  control  problem  with  noise  covariance  matrix  perturbations. 

Although  the  filter  and  control  gains  can  be  determined  separately  in  the  correspond- 
ing classical  problem  by  the  independent  Riccati  equations  (Equs.  (23)  and  (35))  for  P 
and  S,  both  of  these  variables  enter  into  the  Eq.  (54),  which  determines  Y.  This  last 
equation  can  be  regarded  as  a symmetric  linear  differential  equation  in  Y with  driving 
term  -SGB~1GTS  and  zero  terminal  value.  Since  B is  positive-definite  by  assumption, 
this  driving  term  is  always  at  least  negative-semidefinite,  so  Y(t)  is  symmetric  and 
positive-semidefinite  for  all  t < tf.  The  fact  that  Y(tf)  is  zero  indicates  that  dual 
phenomena  are  unimportant  in  control  problems  of  sufficiently  short  duration,  which  is 
intuitively  reasonable  because  there  is  too  little  time  in  such  cases  to  take  enough 
advantage  of  an  improved  state  estimate  to  justify  the  cost  of  achieving  it  by  nonoptimal 
explotiation  of  the  current  estimate.  The  estimation  error  in  the  classical  system  obeys 
the  differential  equation 
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(x  -x)  = (F  - PHt R"1  H)(x  - x)  + (w  - PHtR~1v).  (57) 

If  the  system  ( F , H)  is  observable,  then  Eq.  (57)  is  stable,  which  implies  that  Eq.  (54), 
for  Y,  is  stable  in  reverse  time. 

There  is  also  a connection  between  the  behavior  of  Y and  the  information  theory  of 
Shannon  [9].  The  entropy  of  the  state  vector  in  the  classical  problem  can  be  determined 
by  standard  methods  in  units  of  “nats”  as 

[n  £n(2ne)  + £n|P|] . 

If  the  measurement  process  is  discretized  in  small  time  increments  of  length  A,  and  if  M 
is  used  to  denote  the  value  of  P immediately  before  a measurement,  the  amount  of 
information  that  measurement  provides  the  controller  about  the  state  is  given  by  the 
resulting  reduction  in  the  entropy,  which  is  asymptotically 

j EnlAfl  - Zn\M  - MHTR~'HMA\. 

Taking  the  limit  of  this  difference  as  A -*  0 and  dividing  by  A gives  the  information  rate 
of  the  measurements  in  nats  per  unit  time  as 

| tr(PHTR-lH). 

On  the  other  hand,  it  follows  from  Eq.  (54)  that 

IY1  = 2tr(PHTR~'H)\Y\  - 2tr(F)\Y\  - trlSGB’1  GTS  adj(Y)] . (58) 

This  implies  that  at  least  the  determinant  of  Y will  remain  close  to  zero  if  this  informa- 
tion rate  is  high,  which  again  is  consistent  with  the  intuitive  interpretation  of  dual  control 
phenomena.  Whether  the  values  of  all  the  Y components  remain  small,  however,  will  also 
depend  on  the  structure  of  the  observation  system  in  the  general  multivariate  case. 

The  dual  aspect  of  the  optimal  control  of  Eq.  (53)  arises  in  two  ways.  One  is  by 
the  direct  addition  of  a term  depending  only  on  the  current  values  of  Y and  of  the 
control-dependent  noise  perturbation  coefficients  T and  fl.  The  other  is  through  the 
current  value  of  0,  which  in  turn  depends  on  all  future  values  of  these  quantities  and  of 
the  state-dependent  noise  perturbation  coefficient  'I'  as  well.  These  two  effects  correspond 
roughly  to  the  phenomena  of  “caution”  and  “probing”  identified  by  Bar-Shalom  and  Tse 
[10]  in  connection  with  their  and  Meier’s  more  general  “wide-sense  adaptive”  approach 
[11,12]  to  dual  control  problems. 
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AN  ALTERNATE  CRITERION 

To  what  extent  do  the  preceding  results  depend  on  the  special  nature  of  the  underly- 
ing linear-quadratic-Gaussian  control  problem?  It  is  instructive  to  consider  a variant  of 
this  problem,  solved  recently  by  Speyer,  Deyst  and  Jacobson  [2],  in  which  the  quadratic 
performance  index  (3)  is  replaced  by  the  exponential  criterion 


where  p is  a scalar.  If  the  preceding  state  and  control  dependences  of  the  noise  covariance 
matrixes  are  introduced  in  this  context,  the  state  estimation  results  are  the  same  as 
before.  Hence  it  is  meaningful  to  consider  an  optimal  expected  cost-to-go  function  de- 
fined to  first  order  in  h as 


1 

r 

1 It  [f  t \ 

J(x,  D,t)  = E\ 

M exp 

— pi  xJSfX*+  J uTBudt\ 

I 

t 

.2  \ t '■ 

where  u is  generated  by  an  optimal  control  law. 


/ 


x(t)  = x,  P(f)  = P(t)  + 2£> 


'(60) 


It  follows  from  this  definition  that 

J(x,D,  tf)  = Ex/i  D[lJiell2lixTSfx].  (61) 

Assuming  that  P(tf)  from  Eq.  (23)  is  invertible  and  that  P-1  {tf)  > pS^,  this  expectation 
can  be  evaluated  for  the  class  of  conditional  state  distributions  encountered  here  by  com- 
pleting the  square  in  the  exponent  and  using  standard  results  for  the  moments  of 
Gaussian  distributions.  With  much  manipulation  this  result  can  be  expanded  to  first 
order  in  h as 


J(x,  D,  tf)  = — |i;2  expj l&T[^Mf  + + iitr(MfD) 

+ p2 xTSf(P}1  -nSf^TriSfiP}1  -^Sf)-lAfPf] 

+ ^ trlxxTlSfir,1  -iiSfr'Afrf-iiSfr'SfYrf  -pS^S^lj 

J (62) 

where  Pf  and  A f denote  P{tf)  and  A(f^),  and 

Mf  = Sf  + nSf(rfl  - nSfT'Sf.  (63) 

For  a small  time  increment  A,  J obeys  the  following  recursion  relation  to  first  order 

in  A: 

J(x , D,  t)  = min  Ex  w vrx  D[e1'2^uTBu  J(x  + Ai,  D + AI>,  t + A)] , (64) 

u 
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where  Ax  and  AD  are  the  increments  in  x and  D that  occur  in  the  time  interval  [f,  t + A] 
when  control  u is  used.  Expanding  J to  first  order  in  A with  a Taylor  series  about  (x,  D, 
t),  using  the  dynamics  of  Eqs.  (24)  and  (25)  and  taking  expectations  in  the  usual  way 
gives  the  Bellman  equation 

J( x,  D,  t)  = min  e^l^AuTsu  D t)  + A|j(  + j^Fi  + Gu)  + fr  [(F  _ phTr-1H)D 
+ D(F T - HTR-'HP)  + * x + r'u  + PHfR-'Sl'uR-'HP]  + \ PHtR~'HP 

Si 

+ DHTR-'HP  + PHtR-1HD  - PHTR-ln'uR-'HP  j 

+ £ (PAPHtR~  1 HP)ijk (PJDi)jik\ ) . (65) 

yfe  J 

Cyclically  permuting  matrix  products  in  the  trace  operand,  expanding  the  exponential 
factor  in  a Taylor  series,  subtracting  J(5c,  D,  t)  from  both  sides  of  the  resulting  equation, 
dividing  by  A,  and  neglecting  higher  order  terms  in  A gives 

min  (~  n(uTBu)J  + Jt  + J^Fx  + Gu)  + uTTr[PtfrR-‘i-SlR-xHP(JD  -Jn)  + T JD] 
u \2 

+ xTTr(*JD)  + tr \[JdF  + F^JD  - (JD  - JVx)PHTR-'H  - lFR-'HP(JD  - J^D 

+ \ JrxPHTR-'Hp}  + £ (PAPHTR-lHP)ijk  (P</Di)J  = 0.  (66) 

ijk  • 

Equating  the  u -derivative  of  Eq.  (66)  to  zero  gives  the  minimizing  control  as 

D~  1 

u = - — {GTJT  + Tr[PHTR~1SlR'1HP(JD  -Jn)  + rV0]}.  (67) 

Substituting  Eq.  (67)  into  Eq.  (66)  to  eliminate  the  minimization  operation  gives  the 
following  partial  differential  equation  for  J: 

Jt  + [J^F  + TrT(*JD)]&  + tr  | [JdF  + FtJd  - (JD  - Jii)PHTR-1H 

-HTR-lHP(JD  -Jn)]D  + JiiPHTR-1HP J - {J^G 


+ TrT[PHTR-1SlR-1HP(JD  - JiSi)  + TJD]}  ~ {GTjT 


(68) 

(Continued) 
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X 


+ Tr[PHTR-'SlR~'HP{JD  -Jn)  + VJD]}  + £ (PSPHTR^HP)ijk 

ijk 

( PJDx)jik  = °- 


(68) 


I 


An  exact  solution  to  Eqs.  (68)  and  (62)  is  not  known.  However,  for  a cost  function 
of  the  form 

J(x, D,  t)  = exp  | \ixT  M + fiMDM^j  x + n<pTx  + ntr(ND)  + fr(iiTn*)J 


(69) 


where  M,  N,  and  n are  symmetric  and  the  components  of  <t>  and  II  are  all  of  order  h,  it 
can  be  verified  that 

Jj  = n[xT(M  + 2iiMDM)  + <t>T  +piTIIi]J  (70) 

JD  = p(N  + hMxxtM)J  (71) 

^xx  = + 2nMDM  + 2/iIlx  + n[(M  + 2nMDM)x  + <j>  -t-pllix]  [xT(M  + 2 fiMDM)  + <pT 

+ /ux7’nic]}J  (72) 

Jdx  = M2[(A 1xtM)'  + (MxTM)"  + (N  + i±MxxTM)xTM]J  + terms  of  order  h (73) 


Jt=H 


\~  M + nMDM  + nMDMjSc  + <pTx  + tr(ND)  + tr(x i^fli)  - J.  (74) 

\2  / 3 2 


I 1 

l 

C: 

« 


■ £ 

I i 7 


i f 

1 I 

■ t 


Substituting  Eqs.  (69)  through  (74)  into  Eqs.  (62)  and  (68),  neglecting  terms  of  second 
order  in  h,  and  equating  coefficients  of  like  powers  of  Sc  and  D shows  after  much  manip- 
ulation that  the  Bellman  equation  and  boundary  condition  are  satisfied  to  first  order  in 
h if 


N = M + Y 


(75) 


and 


M = - MF  - FtM  + M(GB~xGt  - isPHtR~^HP)M\  M(tf)=Mf  (76) 

Y = Y(PHTR-1H-F)  + (HTR-lHP-FT)Y-MGB~1GTM-,  Y(tf)  = 0 (77) 

0 = [M(GB~lGT  - iiPHTR'^HP)  - FT]<t>  + MGB~lTr(TN  + PH'rR~1YLR~1HPY) 

-fjMPHTR-lHP  Tr(PAPY)  - Tr^N  + uPf^R-^HPU)- nM  Tr(.{(PAPHTR'lHP)'P 
+ [(PAPHTR-iHPyP] ' + [(PAPHTR-'HPyP]  "}M); 

- pSf)~^  TrlSfiP}1  - nSfY1  \fPf]  (78) 
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ri  = 0 + 0'  + 0";  \\(tf)  = n[ Sf(rfl  - iisfrl \f(p-f'  - nsfrlsf] \rfl  - 

0 = n [(GB-'(P  -nPHTR-lHP)M  - F]  + [(MTM)'B~1Gt  - M'V  - fi(MPAPM)'  J (79) 

phtr~ihp]m 


d = m tr(MPHTR-lHP)\  a(tf)  = \I  - nPfSf  |.  (80) 

Furthermore,  the  solutions  to  Eqs.  (75)  through  (80)  are  such  that  M,  N,  and  IT  are 
symmetric  and  <f>  and  n are  of  order  h.  Under  the  appropriate  uniqueness  and  regularity 
conditions,  therefore,  they  and  Eq.  (69)  determine  the  optimal  cost  function  to  first 
order  in  h.  From  Eq.  (67),  the  optimal  control  law  is 

« = - B~1{Gt[M  + p(  Ili  + 2 MDM)]x  + GT0  + TriPH7,  R~*  SIR-l  HPY  + T(Af 

+ nM3cxTM)}}  (81) 

to  first  order.  The  variable  M here  corresponds  to  the  variable  Q in  Speyer  et  al.  [2] , and 
these  results  reduce  to  theirs  when  T,  fl,  and  are  all  zero. 


Role  of  Measurement  Noise 

In  the  limiting  case  of  perfect  state  measurements,  an  optimal  expected  cost-to-go 
function  J(x,  f)  can  be  unambiguously  defined  as  the  conditional  expected  cost-to-go 
under  an  optimal  control  law,  given  x(t)  = x.  A similar  derivation  shows  that  the  Bellman 
equation  for  this  case  is 

JxFx  + tr  \jxx  (}<?  + *'*)]  ~ WxG  + TrT  (Jxxr)]B~l  [G7J7  + Tr^D]  j 

+ Jt  « 0;  J(x,  tf)  = ne1  J ^ 

to  first  order  in  h,  and  that  the  corresponding  optimal  control  law  is 

U = _7T  [GT«£  + W„r) ].  (83) 

It  is  a matter  of  straightforward  substitution  to  verify  that  this  Bellman  equation  is 
satisfied  to  first  order  by  the  function 

Ax,  f)  = exp  nxTMx  + nr\Tx  + -y  tr(xxTnx)J  (84) 
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Af  = - MF  - FtM  + M{GB-'Gt  - nQ)M-,  M(tf)  = S,  (85) 

Gt  - nQ)  - FT]ri  + Af  GB"1  Tr(fM)  - Tr(*Af  + pfiQ);  rj(tf)  = 0 (86) 


n = © + 0'  + 0”;  n(fy)  = 0,  where 
0 = IT [(GB~lGT  -nQ)M-F]  + (MrM)'B~lGTM  -AWM 


a = pa  tr(MQ);  a(tf)  = 1 (88) 

and  that  the  optimal  control  law  can  be  expressed  as 

u = - B"MGT(Af  + /iTLc)x  + GT  t)  + Tr[r(M  + nMxxTM))}.  (89) 

This  reduces  to  the  result  of  Jacobson  [13]  when  T and  'I'  are  identically  zero. 

To  investigate  the  dual  control  phenomena  here,  it  is  necessary  to  determine  first 
which  part  of  the  optimal  control  law  (Eq.  (81))  constitutes  “optimal  exploitation  of 
current  state  information.”  It  was  natural  to  interpret  the  latter  as  an  extended  form  of 
certainty-equivalent  control  in  the  case  of  the  quadratic  criterion,  but  it  is  clear  from  the 
results  of  Jacobson  [13]  and  Speyer  et  al.  [2]  that  this  form  of  c*;rtainty -equivalence 
does  not  even  hold  for  the  exponential  criterion  m the  classical  case  without  noise 
covariance  perturbations,  because  M differs  from  Af.  However,  a natural  extension  of  this 
property  does  hold  in  this  case.  Comparing  derivatives  and  boundary  conditions  shows 
that 

M = S[I  - n(K  + P^]-1  = [/-pS(K+P)]"1S  = S + MS[(K+Pr1  -nSV'S  (90) 
where  S is  as  given  by  Eq.  (35)  with  >1=0  and 


and  that 


K = FK  + KFT  - Q;  K(tf)  = 0 


Af  = S(/  - nKS)'1  = (/  - uSKy'S. 


Since  S and  K are  independent  of  the  measurement  process  parameters,  this  means  that 
the  instantaneous  value  of  the  optimal  control  for  both  noisy  and  perfect  measurements 
can  be  realized  as  the  functional  composition 

u = ~B-1GT[r-fiS(K  + P)]~1Sx  (92) 

of  a control  law  determined  entirely  by  the  problem  with  perfect  measurements  operating 
on  the  mean  x and  covariance  matrix  P of  the  current  conditional  state  distribution, 
where  these  parameters  are  taken  respectively  as  x and  0 in  the  case  of  perfect  measure- 
ments. This  decomposition  therefore  shares  the  essential  properties  of  the  refined 
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certainty-equivalence  concept  described  in  the  preceding  section.  The  main  difference  is 
that  the  construction  here  is  slightly  more  elaborate  and  involves  the  covariance  matrix 
generated  by  the  state  estimator  as  well  as  the  mean. 

This  idea  can  be  extended  to  the  context  of  noise  covariance  perturbations  by 
showing  from  Eq.  (90)  that 

M + 2nMDM  = [I-  nS(K  +P  + 2D)]~1S 

to  first  order  in  h,  and  showing  from  Eqs.  (90),  (87),  (79),  (35),  (26),  and  (23)  that 

n = {[/  - nS(K  + P)]  -1  (T  + p(SPABS)'PS]  [/  - n(K  + P)S}-1}'[I  - »(K  + P)S) ' 1 , (93) 

where 


f = © + ©'  + 0";  0 = T(GB-1  GTS  -F)  + (SrS)'B~1GTS 

-( S*S)\l-nKSY,  T(t,)  = 0 (94) 

and  that  H is  given  by  Eq.  (93)  with  P = 0 and  A = 0.  Since  Eq.  (94)  is  also  independent 
of  the  measurement  process,  it  follows  from  Eq.  (81)  that  a similar  realization  of  the 
optimal  control  law  here  can  be  constructed  to  first  order  in  h as 

u = -B-l(GT[/-MS(ff  + P + 2D)]-lSx  +pTr{[(nG)"  + A#rAf]i*T}  + Tr(rM) 

+ GT<t>  + 7V[(r  +PHTR-1  SIR-1  HP)Y] ) (95) 

where  M and  II  now  denote  the  expressions  in  Eqs.  (90)  and  (93).  The  optimal  control 
for  the  case  of  perfect  measurements  is  given  by  Eq.  (95)  with  x=xandP  = D = 0 
there  and  in  Eqs.  (90)  and  (93),  and  with  the  last  two  terms  of  Eq.  (95)  replaced  by 


This  construction  shows  that  the  optimal  control  law  can  be  realized  as  the  sum  of 
a certainty-equivalent  control  law,  in  the  extended  sense  proposed  here,  and  a residual 
deterministic  term.  If  “optimal  exploitation  of  current  state  information”  is  interpreted 
as  certainty-equivalent  control  in  this  sense,  then  the  dual  control  phenomenon  here  is 
this  residual  term,  an  additive  deterministic  time  function  as  in  the  case  of  the  quadratic 
criterion.  The  portion  of  the  deterministic  terms  in  Eq.  (95)  to  be  included  in  the 
certainty-equivalent  control  law  is  somewhat  arbitrary,  however,  because  this  form  of 
certainty-equivalence  allows  the  use  of  the  deterministic  time  functions  P and  A as 
arguments  in  the  control  law.  It  would  be  ideal  if  0 in  Eq.  (95)  could  be  decomposed  as 

W)  = h 1^(0,  A(f),  f]  + Y(t)f2  (P(f),  A(f),  f]  + f3  (P(f),  t ] 3(f)  (96) 

with 

6(t)  - L(t)e{t)  + Y(t)fA  (P(f),  A (t),  f] ; 6(tf)  = 0 

TJ(O“fi(0,0,  t) 


26 


' 

V 

■ 


NRL  REPORT  8071 

where  through  fi  are  independent  of  the  measurement  process  and  L depends  only  on 
the  completely  deterministic  problem.  The  dual  control  could  then  be  reasonably 
identified  in  analogy  with  the  case  of  the  quadratic  criterion  as  the  deterministic  term 

- B-'{GT[Yf2(P , A)  + f3(P)d]  + tr[(r  + PHTR~1nR-1HP)Y]} 

coupled  to  the  control  through  the  Y matrix. 

Such  a decomposition  of  <t>  has  not  been  found  for  the  general  case.  In  the  special 
case  of  classical  process  noise  (T  = 0,  ^ = 0),  however,  A,  T,  and  t]  are  all  zero,  and  the 
optimal  control  law  can  be  realized  as 

u = - B~x{Gt [I  - nS(K  + P + 2D)]-1Sjc  + Gt[I  - (jlS(K  + P)\~l0  + Tr(PHT R~l SIR'1  HPY)} 

where  6 is  given  by  Eq.  (55)  with  T = 0,  = 0,  and  Y as  given  by  Eq.  (77)  rather  than 

Eq.  (54).  The  dual  control  in  this  case  is  therefore 

-B-1{GTU-nS(K+P)ylO  +Tr(PHTR-1SlR-lHPY)}. 

The  variable  Y in  this  context  differs  from  that  for  the  quadratic  criterion  because 
M replaces  S in  the  driving  term  of  the  defining  differential  equation,  Eq.  (77).  Also,  it 
follows  from  Eqs.  (71)  and  (72)  that 


PJ 

except  for  approximately  infinitesimal  terms.  The  qualitative  behavior  of  this  Y remains 
the  same  as  that  of  Eq.  (54),  however. 


MEASUREMENT  NOISE  STATE  DEPENDENCE 

The  case  of  state-dependent  measurement  noise  covariance  matrixes  is  analyzed  in 
discrete  time  because  of  difficulties  described  earlier.  For  this  purpose,  the  following 


control  problem  is  considered: 

1 = Fixi  + Giui  + wh 

*0 

(Normal  (i0,  P0)  (dynamics) 

zi  =Hixi+vi 

(state  measurements) 

jV-1 

1 

Jm  2 £[*NSf*N  + 

£ 

i=0 

(xjAiXi  + uTBfUf) J 

(criterion  to  be  minimized) 

where  Ffl  exists,  and  {u/,}  and  {t/,-}  are  independent  zero-mean  normal  random  variables, 
given  the  current  state  and  control  history  such  that 

cov(Wj)  = Qj  + 2r.u(.  + 2'P'jcf;  i * 0, ...,  N - 1 
cov(Uj)  = Rf  + 2«;_1u,._1  + 2Tixi;  i = 1, ....  N 
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and  the  components  of  T,-,  'I',,  and  T,  are  all  of  order  h.  The  convention  here  is 
that  measurement  z,  is  available  at  epoch  i before  control  u,  is  chosen,  except  at  the 
initial  epoch,  when  there  is  no  measurement  and  u0  is  chosen  on  the  basis  of  the  prior 
state  distribution.  The  linear  term  in  the  control  is  excluded  from  the  criterion  for 
simplicity,  but  this  is  otherwise  the  discrete-time  analog  of  the  control  problem  given  by 
Eqs.  (1)  through  (3). 


State  Estimation 

Suppose  that  the  conditional  density  of  the  state  at  epoch  i after  the  receipt  of  z is 
of  the  form  of  Eq.  (8),  with  parameters  x,,  V,,  X(-,  L(,  Af  and  with  corresponding  mean 
and  covariance  denoted  by  x,  and  Pt.  The  density  of  w,-  given  x,  and  u,  is  zero-mean 
Normal  with  covariance  matrix  Q + 2*'^,  where  Q denotes  Q,  + 2r'u..  Letting  s denote 
Fixi  implies  that  s has  a density  of  the  form  of  Eq.  (8),  such  that 

x = FjXj 

V =FiViFj 


\T  = \TFfl 
L =(FJ1)TLiF71 
A = [(/71)TAlFr1]/ri 

by  Eq.  (11).  Again  assuming  for  convenience  that  Q^1  exists,  it  follows  that 
Pwljs(w>  s)=pw./x(w,FJls ) 


= {1 


e-H2wTQ-lw 
(27T)1/2"  |Q11/2 


to  first  order  in  h,  where  V = ^[FJ1 . If  r = s + wh  then 


p(r)-p(s  + Wj)=  f Ps(r-w)pw./s(w,r-w)dw 
JRn 

= f k(w) g-lUlwTQ-lw+ir-w-xflv-^lr-w-x)]  j 

JRn  (2n)n\VQ\ll2 


to  first  order  in  h,  where 


k(u>)  = 1 + \T (r  - ui  - x)  + tr  L[(r  - ic  - x)(r  - u;  - 5c )T  - V] 

+ -|  (r  - w - x)(r  - w - x)tA (r  - w - x)  - Q'1  ^'(r  - u;)  [/  - Q-i  wwT ] j . 
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Completing  the  square  in  the  exponent  gives 

^-1  «r-i>Cl#-»lr-i>  .*  ^-1  *(ir-</- V*#->  gr-i  )|  T(  V- VA/- 1 VO-1 1 •"-(/- VAf-1  (r-i)] 

pir)  v "4  <2*)i/2"iv'- vm~x v\xn 

where  M - V * Q Ihe  integral  it  the  expected  value  of  a third-degree  polynomial  in  w 
and  (r  - x ) with  respect  to  a Normal  distribution  whose  mean  is  proportional  to  (r  - x). 
Therefore,  it  can  be  ex  press* -d  in  the  form 


constant  ♦ V(r-i)  + lr 


\ L[ir 


x Mr 


s 


x)T -M)  * - (r  - x)(r  - x) 


A(r 


-5)|, 


where  the  constant  is  independent  of  (r  - jf).  L can  be  taken  as  symmetric  because  Af  is, 
and  1 and  X are  of  order  h because  X,  L,  and  A are.  Since  p(r)  is  a probability  density, 
the  constant  term  must  be  unity,  by  Eq.  (8).  Carrying  out  the  details  of  the  third- 
degree  terms  in  this  expectation  shows  that 

A = (Af'1  V\VM~X  )'VM-'  + (Af-1’W#-l)’VA#-1  +[(Af~1  ♦Af"1  )'VM~X  ] ’ 

which  is  also  symmetric  and  of  order  h.  Therefore,  p(r)  is  of  the  form  of  Eq.  (8). 
Decomposing  expectations  into  marginals  of  conditionals  over  s shows  that 


£(r)  = £(s)  = £,x, 


and 

cov(r)  = cov(s)  + Q + 2*  x = £,P,£f  + Q,  + 2rjui  + 2'f'jx, 


to  first  order  in  h. 

If  xl+1  is  denoted  by  y for  convenience,  then_y  = r + G,u,-  andy  has  a density  of 
the  form' of  Eq.  (8)  with  parameters  (x,  Af,  X,  L,  A),  where  Af  and  A are  as  used  pre- 
viously, x denotes  the  preceding  variable  x plus  G,u,-,  and  X and  L are  such  that 

E(y)  = Fixi  + Giui±xi+1  (97) 

cov(y)  = cov(r)  4 N 


o first  order  in  h.  Denoting  zI+ j,  T|+1 , and  by  z,  T,  and  H implies  that 
p(z/y)  = [1  - tHR-'T'yU-R-1^  -Hy)(z  -Hy)T]}] 


e-ll2(z-Hy)TR-Hz-Hy) 


(2ir)1/2ft  lf?|1/2 
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to  first  order  in  h for  a specified  uit  where  R = ifl+1  + 2 flju,-.  As  a function  of  y,  the 
conditional  density  p(y/z)  is  proportional  to  p(z/y)p(y).  Completing  the  square  in  the 
exponent  of  this  product  and  using  the  “matrix  inversion  lemma”  gives 


P(y/z)  + Xr(y  -*)  + L[(y  -*)(y  -x)T  - M\  + |-  (y  - x)(y  -x)TA(y  -3c) 


- R^T'y  [I  - R -1  (z  - Hy)(z  - Hyf ] 
where  g is  a constant  of  proportionality  and 


}) 


exp  - — (y  -y)TV,-1(y  -y) 
(2ir)1/2n|V|1/2 


V - M - MHTiR  + HMHT)~lHAf 
y=x  + VHTR-1(z-Hx). 


The  polynomial  factor  in  p(y/z)  can  be  expressed  to  first  order  in  h as 

1 + bT( y-y)  + trji  L*[(y  - y)(y  - y)T  - V]  + (y  -y)(y  - y)TA*(y  -y)J 

where 

6 = X + [ LVHTR -1  - 2//tP"1T'x(.R  + HMH Trx  ] (z  - Hx) 

+ Tr  (z  - Hx)(z  - Hic)T {R-'HVAVHTR-l  -2 R-'HV[(R  + HMHT)-'rR-'H] " 

+ (R  +HMHT)~'T(R  + HMHT)~1}  - TP'* 

L*  = L + 2(H^«-1Tfl-l/f)x  + 2{[A  + (HTR-irR~iH)']MHT  -rf^R'1  T 
- (TJI” (P  + HMHTyi (z-Hx) 

A*  = A + 

Since  L*  and  A*  are  symmetric,  p(y/z)  is  a density  of  the  form  of  Eq.  (8).  With  the  use 
of  earlier  definitions  and  results,  it  follows  from  Eqs.  (9)  and  (10)  that  its  mean  xI+1  and 
covariance  matrix  Pi+1  are  given  to  first  order  in  h by 

if+l  = *,+!  + Nf^iR  + 2T'*I+1  + HNHt)~ 1 (z  - //*,+! ) + V Tr{(R  + HNHT)~ 1 [HNANHT 
+ T - 2(#rP-1T'AW)']  (P  + HNHT)-1  [(z  - H3c,+i  )(z  - Hx )T  - (P  + /PV//T)] } 

Pi+1  = Af  - NHT^R  + 2T'*i+i  + HNHt)'1HN  + 2{  [V(A  + /fTP-1TR-1H)V']'lV/f7 
- [ + r'fl-i/OV] '}(«  + HNHT)-l (z  - f/*j+i ). 
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If  the  covariance  matrix  perturbations  D,  are  defined  as  (P,  - P,)/2  where  P 
denotes  the  conditional  state  covariance  at  epoch  f and  {P,}  is  the  sequence  of  nominal 
covariance  matrixes  defined  recursively  as 

P,  = Mi  - MiHf(Ri  + PQ  = PQ  (98 

Mi+ 1 = FiPiFJ  * Qi  (99) 

then  it  follows  by  induction  on  i that  the  components  of  D,  are  of  order  h.  It  is  a 
straightforward  but  lengthy  matter  of  substitution  into  the  original  definitions  to  show 
by  induction  that  conditional  mean  x,  and  covariance  matrix  (P,-  + 2D.)  of  the  state  at 
epoch  i is  determined  to  first  order  in  h by  Eqs.  (97)  through  (99)  and  the  following 
equations: 

+ <W*1  + 2(/-pi*ifil1R^iHi+1)(FiDiFj'  + r;U(.  + ^.)]Df+1 
- ^t+iHhRlk  IC«; + t;+1g.)Uj.  ♦ r;+1Pi:.]}(ii.+1  +DI>1Af(.+1DiT+1)-l 
X<*w  **<*!♦! 

+ Kim,+ihL  - WhRlh t;,iM/+i^i )’] (fl.+1 

*0=*0  (100) 

"m  = (/  - KWf + r;«f  ♦ *^k/  - ^ JiriiWw ) 

+ pi^iRik  [«;«/ + + 

+ + rf*lR7+l Ti+1  J?i~+1l^i+1  )^,>1  ] Mi+\rfi+\  - Tj+1 

♦Tr*l*ri»wV,w],H«W  +H|>iM,.+1Hf+ir1(z.+i  -Hw*m); 

D0  = 0 (101) 

*<♦1  = )'PiFjM-il1  + (Afrij  [^;p(.pf  + (*;p.f/Y 

mW)"]^}'^  (102) 

A,-  “ A,-  + HTRJ1  TjRJ^Hj  +(Hl'R-'TiR-lHiY  + (J^T^-lff/;  A0  = 0.  (103) 

The  main  conceptual  distinction  between  the  state  estimation  results  here  and  the 
discrete-time  analogs  derived  earlier  for  the  case  of  state-independent  measurement  noise 
is  the  appearance  of  a driving  term  in  Eq.  (100)  containing  the  difference  between  the 
observed  and  expected  scatter  matrix  of  “innovation  vector”  (zi+1  - HMS,„  ).  This  term 
is  present  in  the  discrete-time  version  even  if  the  T,  are  zero,  but  it  vanishes  in  the 
continuous-time  limit  in  this  case.  However,  this  does  not  happen  for  nonzero  T. 


»%■*! a^rwn-* 
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Optimization 

If  the  optimal  expected  cost-to-go  function  at  epoch  i + 1 (after  zl+1  is  available)  is 
of  the  form 

J(x,  D,  i + 1)  = xTSj+1x  + + 0,^  )x  + tr((&|+1  + y4>1  )D]  + y e(>1 

to  first  order  in  h for  xl+1  = x and  Oj+1  = D,  where  Sl+j  and  y|+1  are  symmetric,  then 
the  Principle  of  Optimality  implies  that  the  optimal  expected  cost-to-go  at  epoch  i is 


J(i,  £>,  i)  = min  E (xf^-x,.  + uTB,u  + i,T+1S|+1x,+1  + ef+1 , 
u 1 

+ (^l  + *£i»w  +**®m  + (104) 

to  first  order  in  h,  given  that  x,-  = x,  u,  = u,  and  Dt  = D.  The  expectation  in  Eq.  (104) 
can  be  evaluated  from  the  dynamics  of  x and  D to  first  order  in  h as 

\ [xTAix+uT(Bi  + Gf8MGt)*+€M  + + 

+ («1  + W + G,“>  + G,«  + ^((/  - Hj+1R-\Hi+1Pi+1 ) 

x rM(/  - p^hT^h^  wpF?  + r>  + *;.*) 

+ **VWW  Yi^iHhRlh^  + + 

+ Sj+1  (FjDFj  + r;.u  + *JS)  + AjD]  + | irfTSjtlF.i.  (105) 

Equating  the  u -derivative  of  Eq.  (105)  to  zero  shows  that  this  expectation  is  minimized  if 

« - - W ♦o?»w0<>rlwf  [«,♦,*!*  * Vi  +«,-.i ♦ 

X 'Vi  ))  - >r,<'  - m'm  )y,.i 

* n.i  i >•  woe) 

Substituting  Eq.  (106)  into  Eq.  (105)  to  eliminate  die  minimization  operation  in  Eq.  (104), 
and  equating  coefficients  of  like  powers  of  x and  D in  the  resulting  equation  shows  by 
induction  on  N - i that 

J(x,  D,  o = \ + (nj  + tr((S,  + y,)D]  + i ef  (107) 
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t 

A 


r . 

C; 

w 


& 

b 


| 

E 


+ GfsmG,)'l3>[('  - pi+\HJ+\R~il\Hi+\  )r,(/  - Hf+iRi+i ^i*ipi*i  )yM 

7«+i  J + W ■ >*,<' 

- Hf+lRl+lHi+lPi+l  )Yi+ 1 ] ; ^ =0  (111) 

e«‘  = ei+l  + fr^iPi  + Si+lPl*lrfllR£lHi+lMi+l ):  eN  = ) (112) 

since  S,  and  Kf  are  symmetric  and  J?j  and  di  are  of  order  ft.  This  implies  that  the  optimal 
control  law  here  is  given  to  first  order  in  ft  by  Eqs.  (97)  through  (103),  (106),  and  (108) 
through  (111),  with  u,-  and  xt  replacing  u and  x in  Eq.  (106). 

Role  of  Measurement  Noise 

If  exact  measurements  of  system  state  x are  available  to  the  controller,  then  the 
optimal  expected  cost-to-go  function  can  be  defined  directly  in  terms  of  the  system  state. 
If  this  function  is  of  the  form 
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. » 
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l 


to  first  order  in  h,  given  that  x,  = x and  ut  = u.  It  follows  from  the  dynamics  of  x that 
the  expectation  in  Eq.  (113)  is 

\ [xT(At  * FfSMF,)x  * uT(Bi  + G?Si+1  Gj)u  + 5f+1  + fr(S,+1Q,.)] 

+ [Gf(S1+1F,.x  + T?i+1)  + 7V(r^+1)]^  + [FT »j.+1 
+ 7V(^i+1)]tx.  (114) 

Equating  the  u-derivative  of  Eq.  (114)  to  zero  shows  that  this  expectation  is  minimized 
if 

u = - (Bj  + GfS w G,)-1  [Gf  (Si+1  + Vl ) + 7XrfS.+1 )] . (11 5) 

Substituting  Eq.  (115)  into  Eq.  (114)  to  eliminate  the  minimization  operation  of  Eq. 
(113),  and  equating  coefficients  of  like  powers  of  x in  the  resulting  equation  shows  by 
induction  on  N - i that 

J(x,  i)  = xTSix  + 7} Tx  + 8f  (116) 

to  first  order  in  h if  S,-  and  t?,-  are  as  given  by  Eqs.  (108)  and  (110),  and  5,-  is  given  by 
the  recursion 

Si  = «i+i  + tr(8MQ,yt  8n  = 0.  (117) 

Therefore  the  optimal  control  law  here  is  as  specified  by  Eqs.  (108),  (110),  and  (115) 
with  the  formal  replacement  of  u and  x by  u,  and  x,.  Comparing  Eq.  (106)  with  Eq. 
(115)  shows  that  the  optimal  control  laws  for  noisy  and  perfect  state  measurements  are 
related  to  each  other  in  the  same  way  as  their  continuous-time  counterparts  in  the  case 
of  state-independent  measurement  noise,  with  the  Y{  here  serving  as  a sequence  of 
coupling  matrixes  for  the  dual  control  terms. 


Asymptotic  Formulas 

A control  problem  of  the  form  considered  in  this  section  can  also  serve  as  a discrete- 
time approximation  to  an  extension  of  the  continuous-time  problem  of  Eqs.  (1)  through 
(3)  with  srate-dependent  measurement  noise,  with  covariance  parameter  R(t)  + 2 12'  (t)u  + 
2T'(t)x,  if  t and  i are  related  such  that  t * t0  + iA,  where  A is  the  (constant)  discretiza- 
tion interval,  and  if 
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F.  = / + F(t)A 
Gi  = G(t)  A 
i4,.=i4(f)A 
B,=B(f)A 
Q,  = Q(t)A 

r,.  = t(oa 

'J'(-  = >F(t)A 
«,•  = £Kt) 

t,.  =|-m 


The  case  of  nonzero  c(f)  in  Eq.  (3)  is  omitted  here.  If  terms  of  second  order  in  A are 
neglected,  and  if  *(t)  is  used  formally  to  denote  the  difference 


A 


and  similarly  for  other  such  differences,  the  results  of  this  section  reduce  to  the  following 
asymptotic  form  for  small  A,  where  the  “t”  argument  is  suppressed  in  the  following 
notation: 


FB  ter 

Z = Fx  + Gu  + {PHT[I-2R~1(Sl'u+r'x)]  + 2 DHT\R~1(z  -Hx) 


+ P 


Tvjfl^T/T1  (z  -HZ)(z  -Hx)t  - 


P = FP  + PFT  + Q - PH^R'^HP;  P(t0)  = P0 


*(*())  =i0 


(118) 

(119) 


D — (F  - PHTR-iHyD  + B(PT  - f^R^HP)  + (r  + PHT R'1  SIR-1  HP)’ u 

+ (*  + PHTR-'TR-lHPyx  + {(P\P),PHT  - [ PVFR-iT’  + T"fl-ltf)P]'} 
XB-i(z-«x);  £(t0)  = 0 (120) 
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t: 


A = 0 + ©'  +©",0  4 p-l+p-l  + HTR-iTR-lH  - A(F  + QP-');  A(t0)  = 0 (121) 


Controller 

u = -B“1{GT(Si  + tj  + 0)  + Tr[TS  + (r  +PHTfi-1nR-1HP)y]}  (122) 

S =-SF-  FTS  -A  + SGB~1GTS;  S(tf ) = S,  (123) 

7=  Y(PHtR~^H-F)  + (HtR-1HP-Ft)Y-SGB~1GtS;  Y(tf)  = 0 (124) 

q = - F^ri  +SGP-1  [Grrj  + Tr(TS)]  - Tr(*S)\  rj(tf)  = 0 (125) 

0 = - Fr0  + SGB-1  [Gt0  + Tr(rr  + PHT R'1  SIR-' HPY)] 

-Tr^Y +PHtR~1TR~1HPY);  6(tf)  = 0 (126) 


Optimal  expected  cost-to-go 

J(x,  D,  t)  = i xTSx  + (i?  + 0)Tx  + fr[(S  + 7)0]  + ^ e (127) 

e = - tr(AP  + SPHTR-'HPy,  e{tf)  = tr[SfP(tf)]  (128) 

Perfect  measurements 

u = -B-1[GT(Sx+r?)  + 7>(rS)]  (129) 

J(x,  f)  = 4 xTSx  + r)Tx  +46  (130) 

Z Z 

5 = - fr(SQ);  5(^)  = 0 (131) 

These  equations  agree  with  those  derived  earlier  for  T identically  zero.  A basically  new 
phenomenon  arises  for  nonzero  T,  however,  when  a driving  term  for  x appears  that  con- 
tains the  scatter  matrix  of  the  measurement  vector  about  its  current  expected  value  and 
depends  explicitly  on  the  length  A of  the  discretization  interval. 


In  either  case,  there  is  a conceptual  difference  from  the  continuous-time  results 
derived  earlier.  As  before,  the  discretization  increment  A must  be  small  enough  that 
A « h in  order  to  justify  the  retention  of  terms  of  order  h but  not  of  order  A in  the 
asymptotic  “differential”  equations.  Since  terms  of  order  h2  were  neglected  in  the 
underlying  discrete-time  analysis,  however,  these  asymptotic  equations  are  also  only 
meaningful  if  A » h2,  an  additional  constraint  that  was  absent  from  the  earlier  con- 
tinuous-time results.  Since  the  standard  deviations  of  the  measurement  noise  components 
are  of  order  A-1/2  for  small  A,  this  additional  constraint  is  equivalent  to  the  condition 
that  the  measurement  noise  magnitude  be  small  compared  to  l//t  with  high  probability, 


ri 
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or  equivalently  that  T'H^z  as  well  as  T'x  remain  of  order  h.  Furthermore,  for  a short 
discretization  increment  A,  the  random  variables 

(z  - Hx)(z  - Hx)T  - IbJa 

are  statistically  independent  at  different  time  steps  to  the  degree  of  accuracy  of  the 
analysis  here,  and  have  zero  mean  and  covariances  of  order  unity.  Hence,  the  cumulative 
contribution  over  an  interval  of  order  unity  of  the  scatter  matrix  driving  term 

PTrl^R-lrR-1  \^z  - Hx)(z  - Hx)T  - ^ R ■ 

in  one  filter  equation,  Eq.  (118),  is  approximately  a zero-mean  random  variable  with 
covariance  of  order  h2/A,  since  this  interval  contains  the  sum  of  1/A  such  increments. 

This  means  that  the  constraint  A » h2  is  also  equivalent  to  the  requirement  that  the 
effect  of  the  scatter  matrix  driving  term  on  the  state  estimate  x remain  small  compared 
to  unity  (with  high  probability).  If  this  inequality  is  reversed,  in  fact,  the  scatter  of  the 
state  measurements  dominates  all  the  other  statistics  in  the  state  estimate  x generated  by 
the  filter  equations,  Eqs.  (118)  through  (121),  for  nonzero  T,  which  seems  suspicious  for 
realistic  applications.  This  phenomenon  suggests  that  the  additional  constraint  in  this 
context  reflects  a practical  limitation  in  constructing  an  appropriate  measurement  noise 
representation. 

The  measurement  noise  in  actual  applications  is  never  exactly  white  anyway,  but 
rather  has  limited  bandwidth  and  nonzero  relaxation  time.  Thus  it  is  more  realistic  to 
imagine  state  measurements  that  are  ordinarily  approximated  as  being  corrupted  by  white 
noise  as  having  been  averaged  by  some  kind  of  “prefilter”  (say  a sample-and-hold  filter) 
before  reaching  the  controller.  As  long  as  the  noise  is  state-independent,  however,  the 
state-estimation  results  do  not  depend  significantly  on  the  exact  form  of  this  prefilter  as 
long  as  its  sampling  period  is  short  compared  to  the  system  time  constants  (and  to  h in 
the  present  context),  and  no  serious  error  is  introduced  by  disregarding  its  effects  and 
treating  the  measurement  noise  as  white. 

This  ceases  to  be  true  for  the  sort  of  measurement  noise  state-dependence  considered 
here,  where  the  state-estimation  equation,  Eq.  (118),  would  depend  explicitly  on  the 
sampling  period  A of  the  prefilter.  This  means  that  an  additional  parameter  of  the 
measurement  process,  normally  unimportant  in  practice  (namely  a time  constant  equivalent 
to  the  sampling  period  of  a sample-and-hold  prefilter),  must  be  specified  in  this  context 
to  achieve  state-estimation  results  accurate  to  order  h.  Such  a time  constant  may  be 
readily  available,  however,  in  applications  that  are  truly  digital.  Also,  the  results  here 
would  be  valid  only  for  measurement  noise  state  dependence  sufficiently  weak  that  the 
dependency  parameter  h is  small  compared  to  the  square  root  of  this  prefilter  time 
constant  A in  properly  adjusted  units.  In  fact,  these  results  suggest  that  if  this  state 
dependence  is  strong  enough  and  if  the  measurement  noise  values  become  independent 
over  a short  enough  time  interval,  then  the  scatter  of  the  measurements  really  does  con- 
tain more  information  about  the  state  than  their  average  value,  which  would  be  a drastic 
departure  from  the  usual  filtering  situation.  The  analyses  here  break  down  at  this  point, 
however,  and  do  not  verify  this  conjecture. 


37 


i 


WARREN  W.  WILLMAN 
TARGET  MOTIONS 


/ 

Fig.  2 — Relative  motion  coordinates 


A NUMERICAL  EXAMPLE 

A numerical  illustration  of  some  of  the  foregoing  ideas  can  be  obtained  from  a 
planar  free-space  interception  problem  in  which  a homing  interceptor  has  noisy  measure- 
ments of  a target’s  relative  angular  position.  Any  out-of-plane  motions  are  assumed  to  be 
controlled  independently.  The  problem  developed  here  is  too  highly  idealized  to  serve 
any  useful  design  purpose,  but  hopefully  is  still  indicative  of  the  basic  character  of  a 
realistic  intercept  situation. 

The  interceptor  is  assumed  to  be  initially  on  a collision  course  with  the  target,  which 
is  subsequently  perturbed  by  a white-noise  acceleration  along  its  trajectory,  perhaps 
representing  random  drag  fluctuations.  The  goal  of  the  interceptor  is  to  minimize  a 

• weighted  sum  of  the  integrated  square  of  its  maneuvering  thrust  and  the  square  of  the 
distance  of  closest  approach  to  the  target.  It  is  convenient  to  adopt  the  relative  coordinate 

* system  shown  in  Fig.  2,  with  the  origin  fixed  at  the  nominal  target  position.  Random 
forces  acting  on  the  interceptor  are  disregarded  here.  Such  forces  would  also  be  significant 
in  reality,  but  their  inclusion  here  would  only  complicate  the  problem  without  changing 
its  basic  character.  Also,  the  interceptor’s  control  acceleration  u is  constrained  for 
simplicity  to  be  perpendicular  to  its  current  relative  velocity  (not  quite  optimal  for  non- 
infinitesimal  0).  With  this  constraint,  u can  be  regarded  as  a scalar,  the  interceptor’s 
speed  is  a constant  s in  relative  coordinates,  and  the  interceptor  path  [3e(t).  y(01 
generated  by  an  otherwise  general  nominal  control  u(t)  obeys  the  equations 
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m(t)  = distance  the  actual  target  location  at  time  tf  would  be  from  the  closest  point 
on  the  interceptor’s  path  if  no  force  were  applied  to  either  vehicle  after 
time  t 


T(t)  = time  at  which  the  interceptor  would  be  at  this  point  on  this  path. 
It  follows  from  these  definitions  that 


T - 1 

m = (tf~t)  u + (G-f)sin  (o  + 0)w 

' \}f  ~ ‘ 


9 = u/s 


where  iu  is  the  random  in-track  target  acceleration,  taken  to  be  a zero-mean  Gaussian 
white  noise  (GWN)  process  with  constant  intensity  parameter  q.  If  these  dynamics  are 
approximated  by  neglecting  the  departures  of  the  ratio  (T  - t)/(tf  - t)  from  unity  and  if 
m(t)  denotes  the  history  of  m generated  by  u,  then  the  deviations  from  the  nominal  path 
reduce  to 


m = (tf  - t)u  + ( tf  - t)  sin  (o  + 6 +0)w 
9 = u/s 


where  m = m - fn  + m(tf),  9=9-9,  and  u = u - u.  It  is  assumed  that  the  actual  time 
and  distance  of  closest  approach  are  approximately  T(tf ) and  m(tf)  for  a reasonable 
nominal  path  generating  tf  and  m.  If  the  criterion  to  be  minimized  is  of  the  form 


a > 0 


and  the  deviations  from  the  nominal  are  small,  then 


tf]u2(tf). 


Assuming  that  deviations  of  T(tf)  from  tf  are  negligible  compared  to  other  deviations 
from  the  nominal  makes  this  equivalent  to  minimizing  the  criterion 


J.  \e 


am2(tf)  + {32  + 2 uu)  dfj 


if  second-order  terms  in  the  deviations  are  disregarded. 

For  initial  conditions  it  is  assumed  that  m(t0)  = m(t0)  and  0(foL.=  so 

initial  values  of  the  state  variables  are  specified  as  m(^Q ) = m(tf)  and  0(to)  = 0.  It  also 
follows  that  0(f)  remains  known  exactly.  The  effects  of  noisy  angular  position  measure- 
ments on  the  estimate  of  m can  be  represented  only  approximately  by  noisy  measurements 
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Of  m Itself.  To  make  such  an  approximation,  we  first  assume  that  the  major  source  of 

Next  r 18  h6  U"Certa,nty  in  the  relative  velocity  derived  from  the  angle  measurements 
Next  we  consider  the  one-coordinate  free-space  system  of  lateral  motions 


£ * GWN(0,  q),  ( q a constant) 

with  dosing  speed  s and  exactly  specified  initial  conditions,  for  which  the  state  covariance 
matrix  components  evolve  with  elapsed  time  r as 

PW=QT 

1 . o 

29t 

1 - 1 
P\\  = 3 <?r3. 

On  the  other  hand,  if  terminal  (lateral)  position  \(tf)  is  estimated  solely  from  noisy  ' 
measurements  of  X during  a time  interval  (t,  t + t)  £hort  enough  that  the  process  no.se 
disturbances  are  negligible,  it  is  routine  to  show  that  almost  the  same  accuracy  is  obtained 
for  r « tf  -t  by  lumping  the  observations  in  the  outer  two  quarters  of  this  interval  at 

^H  n"e8POrld,ng* endpoints>  If  the  X measurements  are  derived  from  line-of-sight  data 
with  noise  intensity  r,  each  lumped  position  observation  has  a linear  variance  of 

4 rs2(tf-t)2 


s-  ,“‘ance  of  the  te™i™1 


i*r1 


4rs2(tf-t)2 


if  only  errors  due  to  velocity  uncertainty  are  considered.  Choosing  r to  match  the 

T'ZIT  °/J  Umf>ed  P0Siti0n  measurements  and  the  disturbances  (of  position)  from 
neglected  process  noise  during  the  same  observation  interval  gives 


1 , 4™2(tf  - t)2 

— nr6  = ! 


■‘t/r 


°'  H,f)  thC"  4*,r  ,!f  - ‘I3-  Bu"  In  the  absence  of  process 

noise,  the  same  vanance  would  be  obtained  from  noisy  observations  of  \(tf)  itself  over 
this  time  interval  if  the  noise  intensity  were  7 


4»vf  (tr()3- 
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In  the  interception  problem,  then,  it  is  a reasonable  approximation  to  endow  the  inter- 
ceptor with  a continuous  measurement  z of  state  variable  m,  which  is  basically  a pre- 
dicted terminal  miss  distance,  such  that 

z(t)  = ift(t)  + v(t),  v is  GWn[o,  4s  (tf  - t)3 J 

where  r is  the  noise  intensity  of  the  line-of-sight  measurements  and  q now  denotes  the 
intensity  of  the  process  noise  component  lateral  to  the  current  nominal  interceptor 
velocity  in  relative  coordinates. 

With  these  approximations,  the  intercept  problem  reduces  to  the  following  with 
respect  to  the  postulated  nominal  trajectory: 


m = (tf  - t)u  + w,  m(t0)  = m(tf) 
0 = uls ; 0(to)  = O 


dynamics 


J=  ~ E |am2(tf)  +J  (u2  + 2 uu  ) dtj  criterion  to  be  minimized 


z = m + v state  measurements 


where 


w * GWN[0,  q(tf  - t)2  sin2  (o  +0)+  2\pd] 
& = Q(tf  - t)2  sin  (°  + 0)  cos  (o  + 0) 
v * GWn|o,  4s  sin  (a  + 0)(tf  - 1)3] 


with  0 = u/s,  and  where  second-order  terms  in  deviations  from  the  nominal  have  been 
neglected.  This  is  a control  problem  of  the  form  considered  above,  with  state  variables 
m and  0.  The  only  nonzero  covariance  perturbation  parameter  is 

^mmd  = " 02  sin  (o  + 0)  cos  (a  + 0). 

Most  of  the  equation  components  determining  the  optimal  control  in  this  case  are  trivial; 
the  rest  reduce  to  the  following,  where  tildes  are  suppressed  in  the  notation: 


u = -(t/-t)(Smmm+«m)---  -u 


1 + 3 a(tf-t)3 


N„ 
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^mm  ~ ^mm  ) 


2s  sin  (o  +6)(tf  - t)a 


i Nmm  (*f)  m « 


p2 

mm 


4im  =q(tr-O2sin2(a  + 0) 

4s  sin  (a  + 0)(*/ - O3 

*m=$nm<*r  t)2dm  +Smm(t/-t)(y  -u);  <t>m(tf)  = 0 
<>0  ’s-Nmm{tf-t)tymm9\  0fl(^)  = O 


^mm («0)  “ ® 


rh-(tf-t)u  + 

4s  sin  (a  + 0)(tf-t)3 
d=u/s;  d(t0)  = 0 


Mt0)  = m(f/) 


Dmm  = 9*mm# - 


n P 

*^mm 1 mm 


Qr 


Dmmtt o)  _ 


2s  y — sin  (a  + 0)(tf  - 0 


This  solution  can  be  incorporated  in  an  iterative  algorithm  that  gives  the  new 
nominal  control  generating  the  nominal  path  for  the  next  iteration  as 


“NEW  = “OLD  + 

where  u is  generated  by  the  optimal  control  law  of  the  current  iteration.  It  is  helpful  for 
this  purpose  to  refine  the  values  of  m(tf)  and  tf  for  the  next  iteration  by  generating 
m(tf)  as  the  distance  from  the  origin  to  the  tangent  to  the  interceptor’s  old  nominal  tra- 
jectory at  time  tf,  where  this  nominal  is  generated  by  the  exact  equations 

x ~ “OLD  ®n  ® OLD  * *(*0 ) = ® 

y = ~“OLD  cos®OLD>  y(f0>  specified 

and  then  replacing  the  value  of  tf  by  the  time  of  closest  approach  to  the  origin  on  this 
tangent,  assuming  that  the  interceptor  traverses  it  at  speed  s. 

Figure  3 shows  some  numerical  results  of  this  iterative  procedure  for  a nominally 
right-angle  interception  in  absolute  coordinates.  The  deterministic  intercept  trajectory 
(zero  control)  was  used  as  the  nominal  for  the  initial  iteration.  Only  mean  sample  path 
results  are  shown  here,  which  would  be  the  pertinent  information  for  nominal  trajectory 
analysis.  The  mean  sample  path  does  what  one  might  expect;  it  departs  from  the 
deterministic  intercept  path  for  a more  nearly  head-on  terminal  approach,  which  appears 
most  clearly  in  relative  coordinates.  The  certainty-equivalent  mean  sample  path  is  also 
shown  to  display  the  contribution  of  the  dual-control  effect  here.  The  corresponding 
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Fig.  4 — Dual-control  coupling  parameter 


values  of  the  one  nontrivial  component  of  the  “dual  control  coupling  matrix”  are  shown 
in  Fig.  4.  These  particular  results  were  obtained  with  a terminal  miss  weight  of  a = 1000. 
This  problem  is  singular  at  the  terminal  time  and  the  iterative  algorithm  diverged  for 
larger  values  of  this  weight. 
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